In ml5 pitch detection using crepe model, how to detect pitch above ±2kHz

571 views Asked by At

I'm successfully using pitch detection features of ml5:

The issue:

No pitch above ±2000Hz is detected. I tried multiple devices and checked that the sounds are visible on sonograms so it's does not seem to be a mic issue.

I assumed it may be a result of sampling rate limitations / resampling done by the library, as the Nyquist frequency (max "recordable" frequency) is that of half of the sampling rate.

I hosted the ml5 sources localy and tried modifying the PitchDetection class

There I see the sampling rate seems to be resampled to 1024Hz for performance reasons. This does not sound right though as if I'm not mistaken, this would only allow detection of frequencies up to 512hz. I am definitely missing something (or a lot).

I tried fiddling with the rates, but increasing it to, say 2048 causes an error: Error when checking : expected crepe_input to have shape [null,1024] but got array with shape [1,2048].

My question is:

Is there something in ml5 PitchDetection class I can modify, configure (perhaps a different model) to detect frequencies higher than 2000Hz using crepe model?

1

There are 1 answers

1
Kasparas Anusauskas On

After more investigation, turns out the CREPE model itself supports up to ~1997Hz (seen in code) or 1975.5 Hz (seen in paper)

The paper about CREPE: https://arxiv.org/abs/1802.06182

States:

The 360 pitch values are denoted as c1, c2..., 360 are selected so that they cover six octaves with 20-cent intervals between C1 and B7, corresponding to 32.70 Hz and 1975.5 Hz

The JS implementation has this mapping which maps the 360 intervals to 0 - 1997Hz range:

const cent_mapping = tf.add(tf.linspace(0, 7180, 360), tf.tensor(1997.3794084376191))

This means, short of retraining the model I'm probably out of luck at using it for now.


Edit:

After a good nights sleep I found a simple solution which works for my simple application.

In it's essence, it is to resample my audio buffer so it has 2 times lower pitch. CREPE than detects a pitch of 440Hz as 220Hz, and I just need to multiply it by 2.

The result is still more consistently correct than YIN algorithm for my real time, noisy application.