JS pitch shift with timbre control

457 views Asked by At

i need a good pitch shift solution for my project to change the voice. a lot of pitch shift js libraries around - tried them all but they don't provide the desired result. main thing is no control on the result voice timbre and i get Mickey mouse or hell zombie sounding stuff but not real voices with it. while here the result is just outstanding if to test with vega's voice: http://www.sonicapi.com/docs/live-task-demo?task=process-elastiqueTune#demo_form unfortunately i'm total zero with audio processing and wanna know at least how it's done, what a type of shifting algorythm is used here and how we can achieve timbre/formant control over the process. any hints highly appreciated. thanks ;)

1

There are 1 answers

0
Sami Hult On

This question touches a very broad subject. Here's a few pointers.

Pitch can be, in general, shifted by offsetting the frequencies that form the voice material. An easy version of this is resampling in temporal domain, where in essential the recording is played back in a different speed. This naturally leads to a tempo change as well which is often not desirable.

In order to preserve the tempo, you need to "explode" the material into its components, in other words, make a domain change from temporal domain to frequency domain. This is what Fourier Transform is for. Once done, you have an estimate of set of frequencies (and respective phases if properly done in complex space) per sample.

The perceived timbre of the voice depends on the relative amplitudes of the frequency set called overtones. Overtones are formed in the speaker's vocal tract and to the listener, heard together with the fundamental frequency. You can control the timbre using different filters in either time domain, spectral (frequency) domain or cepstral domain. This kind of signal processing is a subject for a library section full of books.

You can move from back from the spectral (frequency) domain to the temporal (time) domain using inverse Fourier transform.

To sum up, the naive approach to shift the pitch you need to transform the samples from temporal to spectral domain, resample along the time axis, and then do the inverse Fourier transform to get back to the time domain.

Besides Fourier transform, you could use wavelets. I hope this gets you started.