I am trying to develop a speech / sound recognition program which extracts some useful data such as sound is concerned. For instance .. fundamental frequency / MFCC / Centroid etc. Speech is usually segmented in frames of 20 to 30 ms, and the window analysis is shifted by 10 ms.
I would like to find a patch / object or some useful advice on how can I achieve a window segmentation with the frames, the shift, the step that I prefer for a sound analysis-segmentation.
Does anybody know a way to do this ?
You could try