I am building a speech recognition application for iOS in objective C/C++ for rectifying the pronunciation of the speaker.
I am using Mel-Frequency-Cepstrum Coefficients and Matching the two Sound-Waves using DTW.
Please correct me if I am wrong. 
Now I want to know that which word in the sentence (two sound files) mismatches.
e.g. My two sound files speak
 1. I live in New York.
 2. I laav in New York.
My algorithm should some how point to 2nd word by some sort of indication.
I have used Match-Box open library for reference. Here is its link. Any new algorithm or any new library is welcome.
PS. I don't want to use text to speech synthesis and speaker recognition.
Please direct me to right resources if I have posted question at wrong place.
Any little hint is also welcomed.