How can I use System.Speech to recognize an exact phrase as a command

306 views Asked by At

For example, lets say I want a command "center" that clicks the mouse in the center of the screen. Trivial example, but I'm more interested in the grammar aspects of it.

What if I only want to match "center"?

So if I pause, say "center", and then pause it is a match.

But if I say "I am in the center of the room" I do not get a match.

The following code seems to match the word "center" no matter what part of a phrase it is spoken in:

            Choices center = new Choices( new string[] { "center" } );
            SemanticResultKey centerKeys = new SemanticResultKey( "center", center );

            GrammarBuilder centerGrammarBuilder = new GrammarBuilder();
            centerGrammarBuilder.Append( centerKeys );

           speechRecognitionEngine.UnloadAllGrammars();

           speechRecognitionEngine.LoadGrammar(new Grammar(centerGrammarBuilder));
1

There are 1 answers

0
Matt Johnson On

The speech engines usually do a decent job making sure they don't recognize an in-grammar word in the middle of a sentence. But not always, if you have found one of those edge cases where the engine recognizes terms mid sentence I can recommend two things to help.

  • Add a garbage rule to your grammar, then ignore any recognition event that contains the garbage rule. Usually, this is not recommended (but can work) since people rarely use the garbage rule tradeoffs are made during model building and performance suffers. Also you may notice that it works better in some language models rather than others. Again, this is a consequence of model building and tuning. (inside a grxml it would look like this <ruleref special="GARBAGE"/>) http://msdn.microsoft.com/en-us/library/system.speech.recognition.srgsgrammar.srgsruleref.garbage(v=VS.85).aspx
  • Check and tune the confidence of your resulting word. Even if it recognizes mid sentence you should get a much lower confidence score on that phrase. Unfortunately this tuning sometimes requires lots of audio to get correct.