Understanding MFCC output for a simple sine wave

187 views Asked by At

I generate a simple sine wave with a frequency of 200 and calculate an FFT to check that the obtained frequency is correct.

Then I calculate MFCC but do not understand what its output means? What is the explanation of the output, and where do I see the frequency 200 in this output?

# In[3]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.fft
import librosa

def generate_sine_wave(freq, sample_rate, duration):
    x = np.linspace(0, duration, int(sample_rate * duration), endpoint=False)
    frequencies = x * freq
    # 2pi because np.sin takes radians
    y = np.sin(2 * np.pi * frequencies)
    return x, y

sample_rate = 1024
freq = 200
x, y = generate_sine_wave(freq, sample_rate, 1)
plt.figure(figsize=(10, 4))
plt.plot(x, y)
plt.grid(True)

fft = scipy.fft.fft(y)
fft = fft[0 : len(fft) // 2]
fft = np.abs(fft)
xs = np.linspace(0, sample_rate // 2, len(fft))
plt.figure(figsize=(10, 4))
plt.plot(xs, fft)
plt.grid(True)

mfcc_feat = librosa.feature.mfcc(sr=sample_rate, y=y)
print('\nMFCC Parameters:\n   Window Count              =', mfcc_feat.shape[0])
print('   Individual Feature Length =', mfcc_feat.shape[1])
mfcc_feat = mfcc_feat.T
plt.matshow(mfcc_feat)
plt.title('MFCC Features - librosa')

enter image description here

enter image description here

enter image description here

If I change the frequency to 400 MFCC it gives me this:

enter image description here

What is the meaning of all these colors in three rows?

1

There are 1 answers

2
igrinis On

Individual MFCCs are generally not explainable, so plotting and studying them is not very useful, because it is hard to correlate the changes in frequency bins in certain time frame with the original signal. You can get a good explanation on how MFCCs are computed in this.

Meanwhile let me give a very short explanation what you see and why: You signal y is split into frames and for each frame you get get 20 (the librosa default) coefficients. The internal hop_length parameter is set to 512 samples so after padding your sequence of 1 second (1024 samples) it is converted to 3 frames. As your signal is pretty static in frequency domain, the MFCCs does not change much, hence the change in colors in columns is minimal.