How can I convert an AudioSegment to a NumPy array and back?

124 views Asked by At

As the title states, I have had difficulty converting a PyDub AudioSegment to a NumPy array and back. I am aware of how to convert a PyDub AudioSegment to a NumPy array, and have a hazy idea of how to convert a NumPy array to a PyDub AudioSegment, but the methods I have learned of are varied and do not pair with eachother. So, how could I reliably get an AudioSegment to an array and back?

This is the code I used to get the array:

audio= AudioSegment.from_file("/file/path/sillysong.wav")
data = audio.get_array_of_samples()
data = np.array(data)
data = data.reshape(audio.channels, -1, order='F')
data

I do not know how to get the array in this form back. For context, I am using TensorFlow and I need the data to be in array form. Thank you for your help! (I'm a new coder so there's probably something obvious I'm missing.)

1

There are 1 answers

1
Tino D On BEST ANSWER

Your approach is correct. I have an example of LowRider.wav and I read it using pydub:

from pydub import AudioSegment
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np

audio = AudioSegment.from_file("LowRider.wav")
data = np.array(audio.get_array_of_samples())
data = data.reshape(audio.channels, -1, order='F')
print("Shape of the converted numpy array:", data.shape)

frame_rate = audio.frame_rate
time_vector = np.linspace(0, len(data[0,:])/frame_rate, num=len(data[0,:]))

plt.figure()
plt.plot(time_vector, data[0,:], "-",  label = "Channel 1")
plt.plot(time_vector, data[1,:], "--", label = "Channel 2")
plt.legend()
plt.xlabel("Time (s)")
plt.ylabel("Signal")
plt.show()

This gives you data, which has the data from the two channels. Here is the plot of the two:Audio

To convert back to .wav, use the following code, I included an export for you to test if the conversion happened successfully:

reshaped_data = data.reshape(-1, order='F')

new_audio = AudioSegment(
    reshaped_data.tobytes(),
    frame_rate=audio.frame_rate,
    sample_width=reshaped_data.dtype.itemsize,
    channels=audio.channels
)

new_audio.export("LowRider_Exported.wav", format="wav")

Change the name of the file and let me know if it works :D