I have an audio file of 10 seconds in length. If I generate the spectrogram
using matplotlib
, then I get a different number of timesteps as compared to the spectrogram generated by librosa
.
Here is the code:
fs = 8000
nfft = 200
noverlap = 120
hop_length = 120
audio = librosa.core.load(path, sr=fs)
# Spectogram generated using matplotlib
spec, freqs, bins, _ = plt.specgram(audio, nfft, fs, noverlap = noverlap)
print(spec.shape) # (101, 5511)
# Using librosa
spectrogram_librosa = np.abs(librosa.stft(audio,
n_fft=n_fft,
hop_length=hop_length,
win_length=nfft,
window='hann')) ** 2
spectrogram_librosa_db = librosa.power_to_db(spectrogram_librosa, ref=np.max)
print(spectrogram_librosa_db.shape) # (101, 3676)
Can someone explain it to me why is there a huge diff in the time steps and how to make sure that both generate the same output?
This is because the
noverlap
ofplt.specgram
consider the number of points to overlap the audio segments with, whereas thehop_length
consider the step between the segments.That being said, there is still a 2-points difference between the two results, but this is most possibly due to the boundaries.
This outputs the following picture: