Although there may be inbuilt functions available, I need to create my own Triangular Filter Bank. The below is my code for it. I'm getting NaN values in my HMatrix (filterbank). This is due to "same" values in my FreqArray used in creation of the matrix. I need help in the following issues:
- Knowing whether the sampling frequency of 44100Hz that I have chosen is correct or not?
- How to choose the lower frequency=300Hz and upper frequency=8000Hz to calculate Mel Filter Bank Matrix?
- How to choose a suitable frame-size (frame_length) and number of mel filters(no_of_coeffs)?
function TriFilterBank()
tic
%-----------------------------INITIALISATION---------------------------%
fs=44100; %frequency at which I have sampled my recorded samples
frame_length=256; %How to choose an appropriate frame-size?
low_freq=300; %lower frequency for calculation of mel frequency filter bank (I'm unable to choose a correct one, and find the criteria for choosing it)
high_freq=8000; %upper frequency for calculation of mel frequency filter bank (I'm unable to choose a correct one, and find the criteria for choosing it)
% I have also tried with (fs/2)=22050Hz, but nno good results
no_of_coeffs=20; % This is no. of Mel-Filter banks to create. how to choose a approriate value for this for speech processing applications?
%--------------------------------------------------PRE-PROCESSING FOR MEL FILTER BANK CREATION-----------------------------------------------%
low_linear=2595*log10(1+(low_freq/700));
high_linear=2595*log10(1+(high_freq/700));
band_length=(high_linear-low_linear)/(no_of_coeffs+1);
MelArray(no_of_coeffs+2,1)=zeros(); %to store mel frequencies to calculate mel frequency filter bank
LinearArray(no_of_coeffs+2,1)=zeros(); %to store linear frequencies to calculate mel frequency filter bank
FreqArray(no_of_coeffs+2,1)=zeros(); %to store frequency array to calculate mel frequency filter bank
%{
THIS ARRAY MAY HAVE WRONG VALUES DUE TO SELECTION of WRONG PARAMETERS LIKE low_freq, high_freq, frame_length (frame-size), no_of_coeffs (no. of filter banks). THIS IS MAJOR REASON BEHIND GENERATION OF NaN values in HMatrix
%}
HMatrix(no_of_coeffs,frame_length)=zeros(); %Hmk Matrix/ Filter Bank I'M VERY DOUBTFUL OF THE VALUES GENERATED BY THIS FILTER BANK
MelArray(1)=low_linear;
MelArray(no_of_coeffs+2)=high_linear;
LinearArray(1)=low_freq;
LinearArray(no_of_coeffs+2)=high_freq;
FreqArray(1)=floor((int32(frame_length)+1)*LinearArray(1)/fs);
FreqArray(no_of_coeffs+2)=floor((int32(frame_length)+1)*LinearArray(no_of_coeffs+2)/fs);
for m=1:no_of_coeffs
MelArray(m+1)=MelArray(m)+band_length;
LinearArray(m+1)=700*((power( 10,MelArray(m+1)/2595))-1);
FreqArray(m+1)=floor((int32(frame_length)+1)*LinearArray(m+1)/fs); %The values generated here seem to be doubtful, hence maybe an incorrect filter bank
end
% THE MOST DOUBTFUL/WRONG PART i.e. MEL FREQUENCY FILTER BANK MATRIX CREATION
%---------------------------------------------------------PROBABLE ERRONEOUS PART------------------------------------------------------------%
% I'M GETTING NaN values in this matrix probably due to choosing incorrect parameters for like upper freq, lower freq, frame-size, no.of filter banks, sampling frequency etc.
% In FreqArray I'm getting two same values, hence it's satisfying none of the below conditions and generating a NaN value.
for k=1:frame_length
for m=1:no_of_coeffs
if(k<FreqArray(m))
HMatrix(m,k)=0;
elseif (FreqArray(m)<=k && k<=FreqArray(m+1))
HMatrix(m,k)=(k-FreqArray(m))/(FreqArray(m+1)-FreqArray(m));
elseif(FreqArray(m+1)<=k && k<=FreqArray(m+2))
HMatrix(m,k)=(FreqArray(m+2)-k)/(FreqArray(m+2)-FreqArray(m+1));
elseif (k>FreqArray(m+2))
HMatrix(m,k)=0;
end
end
end
%--------------------------------------------------------------------------------------------------------------------------------------------%
save('TriFilterBank');
toc
end
The code is based on the below equation:
The main parts of the output of the above code is shown below for reference.
For reference I have used the following website:
Thanks in advance!
This frequency is fine. Speech reside below 16khz anyway, so 16kHz is more frequent choice. In the blog post you used for reference it is 16kHz.
This range is not the best, but ok for most applications. For high quality sound the range is from 20Hz to 7600Hz.
Frame size for speech is usually around 25 milliseconds, it is an optimal value to provide stationarity within one frame and resolution for normal rate speech. For 44100 kHz sampling rate this ends in about 1128 (44100 * 0.025) elements in frame, not 256 like you selected. If you want to have a power of 2, then you need 2048 elements in a frame. This would be the FFT order too.
Number of mel filters could be 15-40, 20 is a good value used in many systems, it is found to be useful experimentally.
It is better to read the existing implementation, there are many specific things you won't get from tutorial, a good one is VoiceBox.