Are there any known pitch detection algorithms for detecting multiple specific musical notes in audio representing polyphonic music?
All the algorithms I see referenced for polyphonic music, like MUSIC or ESPRIT, are focussed on the open-ended domain, where you don't know what pitches the audio contains and are trying to use the algorithm to detect them. That's understandably a very difficult problem.
I'm instead interested in a more constrained domain, where you're given a list of 2-6 specific notes, and you need to check to see if those notes exist in the audio. I would think this would be a much easier problem, although still not trivial, but I'm having a trouble finding either code or academic papers on the subject.
My domain is an application where a musician will play specific notes on their instrument, and the program gives them feedback that they played those specific notes correctly.
I'm currently playing around with some NodeJS and C++ code to do this, and my current (naive) approach is to:
- Compute the FFT, and bin the frequencies according to the frequency ranges for all the standard musical pitches.
- Calculate the median amplitude across all frequencies to use as a threshold (T) for noise filtering. Any frequencies with an amplitude below this I ignore as background noise.
- For each note I'm searching for, I calculate the frequencies for the first 3 harmonics, lookup the amplitude for each of those frequencies, and if they're all above the average, then I assume that note is present.
This works somewhat, but the problem I'm having is calibrating the T threshold. If it's too high, it becomes too discerning and doesn't detect any notes unless they're super loud. If it's too low, it's not discerning enough it will return false positives.
The underlying difficulty is that for many instruments, the amplitudes of these note's harmonics don't have a consistent pattern. Some have a big fundamental, which each subsequent harmonic quickly diminishing. Some bass notes have almost no fundamental, which other harmonics that diminish very slowly. So when I'm finding when I find a T threshold that works well for treble notes, it doesn't work for bass notes, and vice versa.
And since I'm using median amplitude to do noise filtering, when two notes are played together with unequal volumes, the louder note can potentially cause the softer note to be filtered out, even if the softer note is still much louder than any other pitch in the FFT. It's difficult finding any sweet spot.
Are there any signal processing or filtering techniques that I should use in this situation to improve accuracy?