How to perform a Cepstrum for pitch detection

1k views Asked by At

Ok, there are a bunch of questions on this here, and plenty of reading material on google, yet I somehow am not able to figure this out. I want to get the fundamental frequency of a segment of speech. The basic steps are supposed to be:

  • take the FFT of a windowed signal
  • convert the FFT from rectangular to polar coordinates (so you can get the magnitude)
  • discard the phase information
  • take the square, then the natural log of each bin of the magnitude
  • take another FFT (or some sources say take the inverse fft?)

Here is how I have implemented this in AS3:

var signal:Vector.<Number> = my1024PointSignal; // an audio signal 1024 samples long
var imx:Vector.<Number> = new Vector.<Number>(signal.length); // 1024 point vector to hold imaginary part of fft

hammingWindow(signal); // window it
zeroFill(imx); // fill imx with zeros

FFT(signal, imx); // convert signal into real and imaginary components of fft

toPolar(signal, imx); // convert fft to polar coordinates

// square each bin, and take the log of each bin, discard phase
for (var i:int = 0, l:int = signal.length; i < l; i++) {
    signal[i] = Math.log(Math.pow(signal[i], 2));
    imx[i] = 0;
}

FFT(signal, imx); // or maybe inverseFFT(signal, imx), i don't know

Now when I do this and end by taking the FFT, when I plot it the bins appear to be in reverse order? I also am seeing a larger peak at the second harmonic than at the fundamental. When I do this and take the inverse FFT, I get an audio signal that looks reflected around N/2, and again the peaks seem to be reversed. The whole thing is also quite noisy. What am I doing wrong?

1

There are 1 answers

2
ederwander On BEST ANSWER

For Cepstrum i always have used to this steps:

  1. Apply hamming windows in the signal (1024 or 2048 points)
  2. Apply FFT
  3. Get magnitude
  4. use just the first half values
  5. Convert to log scale
  6. Apply IFFT
  7. Find the Peak

The equation for cepstrum:

 IFFT(log(abs(FFT(s))))

Maybe are you are seeing reflected because you not get the step four(4)

Difference is between ending with an IFFT and ending with an FFT?

The difference is the scale representation, if do you end using FFT you need extract just the real information, for both below equations you will get the same shape:

IFFT(log(abs(FFT(s)))) == real(FFT(log(abs(FFT(s)))))

Plot example from cepstrum:

For IFFT(log(abs(FFT(s)))):

enter image description here

For real(FFT(log(abs(FFT(s))))):

enter image description here

This is a cepstrum example from a 4096 points sine in 440hz sampled at 44100hz