Audio Comparison using Micorosft speech recognition engine

1.2k views Asked by At

I have an application in which user can speak and a word and he will be given the percentage accuracy of the word he spoke. i.e how much clearly the engine recognized the word.

This all works fine ,but i have a dilemma that what words needed to be added to the dictionary which i will give to the recognition engine as dictionary.

If i give words starting with "p" for case pen then words like pendant ,pent etc all will be added to the dictionary.In that case i am not getting the recognized word as "pen".

Instead i always get other words like "pendant" etc

But if i only add limited words to dictionary like "pe","pen" then for the same recorded file i got the recognized words as "Pen" only.

Means it clearly depends on the words which we give to the dictionary.

I have conveyed the same to my client.But what they want is that they can speak wrong words also for a given input words ,so at that time they need not want to get the accuracy and also get the recognized text.

I have done what i could have done for the issue.But my client needs something apart from universe.

Code :

public OdllSpeechProcessor(string culture, string speechContent , string filePath)
        {
            try
            {
                int counter = 0;
                string line;
                cultureInfo         = new CultureInfo(culture);
                recognitionEngine   = new SpeechRecognitionEngine(cultureInfo);
                words               = new Choices();
                gb                  = new GrammarBuilder();
                gb.Culture          = cultureInfo;
                rndAccuracy         = new Random();

                System.IO.StreamReader file = new System.IO.StreamReader(filePath);
                while ((line = file.ReadLine()) != null)
                {
                    if (line != "")
                    {
                        for (int i = 0; i < srcContent.Length; i++)
                        {
                            if (line.StartsWith(subsetWords, true, cultureInfo))
                            {
                                if (count >= line.Length)
                                {
                                    words.Add(line);
                                    counter++;
                                }
                            }
                        }
                    }
                }


                file.Close(); 

                // Adding words to the grammar builder.              
                gb.Append(words);

                // Create the actual Grammar instance, with the words from the source audio.
                g = new Grammar(gb);

                // Load the created grammar onto the speech recognition engine.
                recognitionEngine.LoadGrammarAsync(g);

Do any experts have solution for this here? Any help will be appreciated.

Thanks

1

There are 1 answers

0
Eric Brown On

You're using a command grammar (i.e., a set of choices). With a command grammar, the engine tries its best to find a match, which can easily result in false positives (as you've seen). You might want to investigate a dictation grammar, particularly the pronunciation grammar, as I've outlined in my answer to this question. Note that the solution I outlined uses some interfaces that aren't available in C# (or at least exposed via System.Speech.Recognition).