I'm building a web app that needs voice activated command. I'm using getUserMedia for the audio input.
For the voice activated command, the process is that the user will need to "calibrate" a command by recording his/her speech. For example, for a "stop" command, the user will say the word "stop" then the app will save the audio snippet. Then for the user to issue a "stop" command, he/she will say the word "stop".
Now the question is, is there any way to compare / recognize the command (audio input) that the user issued from the pre-recorded audio commands that he/she "calibrated" / recorded earlier? In other words, compare an audio stream from another audio (file). Hope anyone can point me to the right direction as I've been researching for this for a long time already.
Thanks in advance.
Note: I'm not comparing/recognizing music like what SoundHound does. Also, I don't think I need speech recognition, that is too complex and unnecessary for the mechanics that I need. Apparently, this is hard, if not impossible, to do without speech recognition. Can anyone recommend a speech recognition library/API (hopefully javascript) that I can try out?
There is no way to do this without the speech recognition because the chances of a human being to produce two identical audio files are much less than 0.000000000000000000000000021%.
You might be able to recognize the voice pitch and compare it fairly accurately with your calibration audio, but to capture the spoken words based on simple audio comparison not coming from a machine - never, absolutely no way.
You could classify certain words/commands based on the changes in the pitch, pause length between syllables, formants, etc... but those are still the first steps in speech recognition.