The effect of the grammar in the Web Speech API

3.2k views Asked by At

In examples for the Web Speech API, a grammar is always specified. For example, in MDN's colour change example, the grammar is:

#JSGF V1.0;
grammar colors;
public <color> = aqua | azure | beige | bisque | black | blue | brown | chocolate | coral | crimson | cyan | fuchsia | ghostwhite | gold | goldenrod | gray | green | indigo | ivory | khaki | lavender | lime | linen | magenta | maroon | moccasin | navy | olive | orange | orchid | peru | pink | plum | purple | red | salmon | sienna | silver | snow | tan | teal | thistle | tomato | turquoise | violet | white | yellow ;

However, in actually using the API (on Chrome 54.0.2840.71), the result function:

  1. Sometimes returns strings that do not fit the supplied grammar
  2. Does not provide the parse tree that describes the speech

What then does the grammar actually do? How can I get either of these behaviours (restricting to the grammar and seeing the parse tree)?


There are 2 answers

jbflow On

I know this is an old question, but I'm going through a few similar ones to this as it's something I've been trying to figure out myself recently, and I have a solution. The grammar doesn't seem to work, at least not reliably or as expected.

As a solution, I've written a function that goes some way toward solving the issue. Supply it with the event.results from the SpeecRecognition.onresult callback, and make sure maxAlternatives is set to something like 10. Also supply a list of phrases. It will return the first transcription it finds containing one of the phrases, otherwise it just returns the transcript with the highest confidence.

function ExtractTranscript(phrases, results) {
  // Loop through the alternatives to check if any of our hot phrases are contained in them.
  for (let result in results[0]) {
    if (new RegExp(phrases.join("|")).test(results[0][result].transcript)) {
      return results[0][result].transcript; // Return them if they are
  return results[0][0].transcript; // Otherwise return the highest confidence

There are probably was of improving upon this solution for long transcripts etc, but it works for my situation for short command like phrases. Hopefully it helps someone else out too.