How to create a Triangular (Mel) Filter Bank used in MFCC for speech recognition in MATLAB?

7.7k views Asked by At

Although there may be inbuilt functions available, I need to create my own Triangular Filter Bank. The below is my code for it. I'm getting NaN values in my HMatrix (filterbank). This is due to "same" values in my FreqArray used in creation of the matrix. I need help in the following issues:

  1. Knowing whether the sampling frequency of 44100Hz that I have chosen is correct or not?
  2. How to choose the lower frequency=300Hz and upper frequency=8000Hz to calculate Mel Filter Bank Matrix?
  3. How to choose a suitable frame-size (frame_length) and number of mel filters(no_of_coeffs)?

function TriFilterBank()
tic
%-----------------------------INITIALISATION---------------------------%

fs=44100; %frequency at which I have sampled my recorded samples frame_length=256; %How to choose an appropriate frame-size? low_freq=300; %lower frequency for calculation of mel frequency filter bank (I'm unable to choose a correct one, and find the criteria for choosing it) high_freq=8000; %upper frequency for calculation of mel frequency filter bank (I'm unable to choose a correct one, and find the criteria for choosing it) % I have also tried with (fs/2)=22050Hz, but nno good results no_of_coeffs=20; % This is no. of Mel-Filter banks to create. how to choose a approriate value for this for speech processing applications? %--------------------------------------------------PRE-PROCESSING FOR MEL FILTER BANK CREATION-----------------------------------------------% low_linear=2595*log10(1+(low_freq/700)); high_linear=2595*log10(1+(high_freq/700)); band_length=(high_linear-low_linear)/(no_of_coeffs+1); MelArray(no_of_coeffs+2,1)=zeros(); %to store mel frequencies to calculate mel frequency filter bank LinearArray(no_of_coeffs+2,1)=zeros(); %to store linear frequencies to calculate mel frequency filter bank FreqArray(no_of_coeffs+2,1)=zeros(); %to store frequency array to calculate mel frequency filter bank %{ THIS ARRAY MAY HAVE WRONG VALUES DUE TO SELECTION of WRONG PARAMETERS LIKE low_freq, high_freq, frame_length (frame-size), no_of_coeffs (no. of filter banks). THIS IS MAJOR REASON BEHIND GENERATION OF NaN values in HMatrix %} HMatrix(no_of_coeffs,frame_length)=zeros(); %Hmk Matrix/ Filter Bank I'M VERY DOUBTFUL OF THE VALUES GENERATED BY THIS FILTER BANK MelArray(1)=low_linear; MelArray(no_of_coeffs+2)=high_linear; LinearArray(1)=low_freq; LinearArray(no_of_coeffs+2)=high_freq; FreqArray(1)=floor((int32(frame_length)+1)*LinearArray(1)/fs); FreqArray(no_of_coeffs+2)=floor((int32(frame_length)+1)*LinearArray(no_of_coeffs+2)/fs); for m=1:no_of_coeffs MelArray(m+1)=MelArray(m)+band_length; LinearArray(m+1)=700*((power( 10,MelArray(m+1)/2595))-1); FreqArray(m+1)=floor((int32(frame_length)+1)*LinearArray(m+1)/fs); %The values generated here seem to be doubtful, hence maybe an incorrect filter bank end % THE MOST DOUBTFUL/WRONG PART i.e. MEL FREQUENCY FILTER BANK MATRIX CREATION %---------------------------------------------------------PROBABLE ERRONEOUS PART------------------------------------------------------------% % I'M GETTING NaN values in this matrix probably due to choosing incorrect parameters for like upper freq, lower freq, frame-size, no.of filter banks, sampling frequency etc. % In FreqArray I'm getting two same values, hence it's satisfying none of the below conditions and generating a NaN value. for k=1:frame_length for m=1:no_of_coeffs if(k<FreqArray(m)) HMatrix(m,k)=0; elseif (FreqArray(m)<=k && k<=FreqArray(m+1)) HMatrix(m,k)=(k-FreqArray(m))/(FreqArray(m+1)-FreqArray(m)); elseif(FreqArray(m+1)<=k && k<=FreqArray(m+2)) HMatrix(m,k)=(FreqArray(m+2)-k)/(FreqArray(m+2)-FreqArray(m+1)); elseif (k>FreqArray(m+2)) HMatrix(m,k)=0; end end end %--------------------------------------------------------------------------------------------------------------------------------------------% save('TriFilterBank'); toc end

The code is based on the below equation:

Mel FilterBank Equation

The main parts of the output of the above code is shown below for reference.

HMatrix (FilterBank) - Image 1

HMatrix (FilterBank) - Image 2

HMatrix (FilterBank) - Image 3

LinearArray

MelArray

FrequencyArray

For reference I have used the following website:

http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/

Thanks in advance!

1

There are 1 answers

1
Nikolay Shmyrev On

Knowing whether the sampling frequency of 44100Hz that I have chosen is correct or not?

This frequency is fine. Speech reside below 16khz anyway, so 16kHz is more frequent choice. In the blog post you used for reference it is 16kHz.

How to choose the lower frequency=300Hz and upper frequency=8000Hz to calculate Mel Filter Bank Matrix?

This range is not the best, but ok for most applications. For high quality sound the range is from 20Hz to 7600Hz.

How to choose a suitable frame-size (frame_length) and number of mel filters(no_of_coeffs)?

Frame size for speech is usually around 25 milliseconds, it is an optimal value to provide stationarity within one frame and resolution for normal rate speech. For 44100 kHz sampling rate this ends in about 1128 (44100 * 0.025) elements in frame, not 256 like you selected. If you want to have a power of 2, then you need 2048 elements in a frame. This would be the FFT order too.

Number of mel filters could be 15-40, 20 is a good value used in many systems, it is found to be useful experimentally.

It is better to read the existing implementation, there are many specific things you won't get from tutorial, a good one is VoiceBox.