Comparing the "Tone" of musical instruments in MATLAB

901 views Asked by At

I am trying to find a way to compare the likeness of short 500 millisecond recordings using MATLAB of the same note played on different instruments.

Going into detail on this specific topic: I am a music student that has been given the task to objectively determine the tone of various modern low brass instruments to determine what instrument should replace the obsolete "ophicleide" or Bass keyed bugle. I first used a visual comparison of a spectrograph of it and 6 other instruments, but that approach was too subjective.

I recorded all of the instruments with the same microphone, equipment, gain levels, and the same notes. For this reason, I believe that the signals are similar enough to use MATLAB tools.

I believe that comparing the fft is going to be the most accurate calculation. I tried at first a freq-domain correlation, and tested different segments of the same tone (eu, and eu2 being variables)

>> corr(abs(fft(eu)),abs(fft(eu2)))
ans = 0.9963

Which is a step in the right direction, but I seem to get the opposite result when I compare different signals: (euphonium and ophicleide sound almost identical)

>> corr(abs(fft(eu)),abs(fft(ophi)))  
ans =   0.5242

euphonium and bass clarinet sound completely different, but this shows higher correlation

>> corr(abs(fft(eu)),abs(fft(basscl)))   
ans = 0.8506

I tried a normalized maximum cross-correlation magnitude formula that I found online, but I am getting the same results

>> norm_max_xcorr_mag = @(x,y)(max(abs(xcorr(x,y)))/(norm(x,2)*norm(y,2))); x =eu2; y = eu; norm_max_xcorr_mag(x,y)
ans =   0.9638

I get a similar result when comparing the other samples

 >> norm_max_xcorr_mag = @(x,y)(max(abs(xcorr(x,y)))/(norm(x,2)*norm(y,2))); x = eu; y = basscl; 
ans = 0.6825

compared to

>> norm_max_xcorr_mag = @(x,y)(max(abs(xcorr(x,y)))/(norm(x,2)*norm(y,2))); x = eu; y = ophi; norm_max_xcorr_mag(x,y)
ans = 0.3519

The Euphonium and Bass Clarinet (basscl) have a completely different sound, and completely different harmonic series, but these formulas are showing closer correlation than the Euphonium and Ophicleide, whose frequency bands look almost like an identical match.

I am worried that these correlations are showing the correlation of true pitch (I am playing the same note on all of these instruments, but the Ophicleide might be out of tune by up to 1 Hz) It could also be accounting for phase, or even total amplitude.

does anyone know of a better clear cut method in comparing the proportions of the harmonic overtones of these complex waveforms?

or am I barking up the wrong tree?

2

There are 2 answers

2
paisanco On BEST ANSWER

With respect to your specific question, the quantity you've computed is essentially the maximum value of the spectral coherence function. The problem is that the spectral coherence is only a good measure of the correlation between two signals if the signals are statistically stationary. That is, if the probability distribution of frequencies in the signals do not vary with time.

Unfortunately, musical instrument note signals are not likely to be stationary, because the very features most important in classifying the difference between how the same note "sounds" to the human ear on different instruments are due to harmonics and modulations that are more than likely time varying over the duration of the note.

So rather than using the spectral coherence, you need a frequency domain or time-frequency domain metric that better captures the similarity between the non-stationary parts of the note spectra.

At this point, it's less of a problem of which MATLAB functions to select (although a look at this example from the Signal Processing Toolbox documentation may help you get started, if you have that toolbox). It is more a question of researching signal processing and feature classification techniques. Here you really have to go to the literature on musical acoustics. Here is just one abstract link - I don't have access to the ACM but you may have access through your university if you are a student.

Good luck with what sounds like an interesting problem !

0
jorgeh On

I'm not an expert in the subject, but I'm aware of a couple of audio features that can help in such problems: Linear Predictive Coding (LPC) and Mel-Frequency Cepstral Coefficients (MFCCs).

A quick search will reveal plenty of information. As an example I found this one and this one (didn't read them, but they looked relevant).

That should get you started. Depending on your interest, you can go really deep in this topic. For example, one thing is to compare the steady state of the notes played by different instruments, but my understanding is that the transient (attack) is extremely relevant perceptually.

Good luck!