I am trying to find a way to compare the likeness of short 500 millisecond recordings using MATLAB of the same note played on different instruments.
Going into detail on this specific topic: I am a music student that has been given the task to objectively determine the tone of various modern low brass instruments to determine what instrument should replace the obsolete "ophicleide" or Bass keyed bugle. I first used a visual comparison of a spectrograph of it and 6 other instruments, but that approach was too subjective.
I recorded all of the instruments with the same microphone, equipment, gain levels, and the same notes. For this reason, I believe that the signals are similar enough to use MATLAB tools.
I believe that comparing the fft
is going to be the most accurate calculation. I tried at first a freq-domain correlation, and tested different segments of the same tone (eu
, and eu2
being variables)
>> corr(abs(fft(eu)),abs(fft(eu2)))
ans = 0.9963
Which is a step in the right direction, but I seem to get the opposite result when I compare different signals: (euphonium and ophicleide sound almost identical)
>> corr(abs(fft(eu)),abs(fft(ophi)))
ans = 0.5242
euphonium and bass clarinet sound completely different, but this shows higher correlation
>> corr(abs(fft(eu)),abs(fft(basscl)))
ans = 0.8506
I tried a normalized maximum cross-correlation magnitude formula that I found online, but I am getting the same results
>> norm_max_xcorr_mag = @(x,y)(max(abs(xcorr(x,y)))/(norm(x,2)*norm(y,2))); x =eu2; y = eu; norm_max_xcorr_mag(x,y)
ans = 0.9638
I get a similar result when comparing the other samples
>> norm_max_xcorr_mag = @(x,y)(max(abs(xcorr(x,y)))/(norm(x,2)*norm(y,2))); x = eu; y = basscl;
ans = 0.6825
compared to
>> norm_max_xcorr_mag = @(x,y)(max(abs(xcorr(x,y)))/(norm(x,2)*norm(y,2))); x = eu; y = ophi; norm_max_xcorr_mag(x,y)
ans = 0.3519
The Euphonium and Bass Clarinet (basscl) have a completely different sound, and completely different harmonic series, but these formulas are showing closer correlation than the Euphonium and Ophicleide, whose frequency bands look almost like an identical match.
I am worried that these correlations are showing the correlation of true pitch (I am playing the same note on all of these instruments, but the Ophicleide might be out of tune by up to 1 Hz) It could also be accounting for phase, or even total amplitude.
does anyone know of a better clear cut method in comparing the proportions of the harmonic overtones of these complex waveforms?
or am I barking up the wrong tree?
With respect to your specific question, the quantity you've computed is essentially the maximum value of the spectral coherence function. The problem is that the spectral coherence is only a good measure of the correlation between two signals if the signals are statistically stationary. That is, if the probability distribution of frequencies in the signals do not vary with time.
Unfortunately, musical instrument note signals are not likely to be stationary, because the very features most important in classifying the difference between how the same note "sounds" to the human ear on different instruments are due to harmonics and modulations that are more than likely time varying over the duration of the note.
So rather than using the spectral coherence, you need a frequency domain or time-frequency domain metric that better captures the similarity between the non-stationary parts of the note spectra.
At this point, it's less of a problem of which MATLAB functions to select (although a look at this example from the Signal Processing Toolbox documentation may help you get started, if you have that toolbox). It is more a question of researching signal processing and feature classification techniques. Here you really have to go to the literature on musical acoustics. Here is just one abstract link - I don't have access to the ACM but you may have access through your university if you are a student.
Good luck with what sounds like an interesting problem !