I've been working on a part of my app for the past few days where I need to simultaneously play and record an audio file. The task I need to accomplish is just to compare the recording to the audio file played and return a matching percentage. Here's what I have done so far and some context to my questions:
The target API is >15
I decided to use a .wav audio file format to simplify decoding the file
- I'm using AudioRecord for recording and MediaPlayer for playing the audio file
- I created a decider class in order to pass my audio file and convert it to PCM in order to perform the matching analysis
- I'm using the following specs for the recording AudioFormat (CHANNEL_MONO, 16 BIT, SAMPLE_RATE = 44100)
- After I pass the audio file to the decoder, I then proceed to pass it to an FFT class in order to get the frequency domain data needed for my analysis.
And below are a few questions that I have:
- When I record the audio using AudioRecord, is the format PCM by default or do I need to specify this some how?
- I'm trying to pass the recording to the FFT class in order to acquire the frequency domain data to perform my matching analysis. Is there a way to do this without saving the recording on the user's device?
- After performing the FFT analysis on both files, do I need to store the data in a text file in order to perform the matching analysis? What are some options or possible ways to do this?
- After doing a fair amount of research, all the sources that I found cover how to match the recording with a song/music contained within a data base. My goal is to see how closely two specific audio files match, how would I go about this? - Do I need to create/use hash functions in order to accomplish my goal? A detailed answer to this would be really helpful
- Currently I have a separate thread for recording; separate activity for decoding the audio file; separate activity for the FFT analysis. I plan to run the matching analysis in a separate thread as well or an AsyncTask. Do you think this structure is optimal or is there a better way to do it? Also, should I pass my audio file to the decoder in a separate thread as well or can I do it in the recording thread or MatchingAnalysis thread?
- Do I need to perform windowing in my operations on audio files before I can do matching comparison?
- Do I need to decode the .wav file or can I just compare 2 .wav files directly instead?
- Do I need to perform low-pitching operations on audio files before comparison?
- In order to perform my matching comparison, what data exactly do I need to generate (power spectrum, energy spectrum, spectrogram etc)?
Am I going about this the right way or am I missing something?
In apps like Shazam, Midomi audio matching is done using technique called audio-fingerprinting which uses spectrogram and hashing.
It is somewhat detailed process and you can find more explanation in this link http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf
There are some libraries that can do it for you dejavu (https://github.com/worldveil/dejavu) and chromaprint (Its in c++). Musicg by google is in java, but it don't perform well with background noise.
Matching two audio files is a complicated process, and like above comments I will also tell you to try first on PC then on phones.