Odd behaviour in Google Web Speech API

186 views Asked by At

I'm working with Google's Web Speech API using Google Chrome (55.0.2883.87) and I'm experiencing some very weird behaviour.

When attempting to speak out names, followed by a number (like John 4) it usually just speaks out the name and the number, as it should - but for some names it puts the word chapter between the name and the number, so Daniel 4 becomes Daniel Chapter 4.

I have picked up some random names and tested them with the following code:

<script>
var names = ['Brian', 'John', 'Mike', 'Julia', 'Daniel', 'Michael', 'David', 'Jason', 'Jack'];

names.forEach(function(name) {
  var msg = new SpeechSynthesisUtterance(name + ' 4');
  window.speechSynthesis.speak(msg);
});
</script>

The msg variable doesn't include the word chapter when logging it with console.log()

Of these 9 names, the names John and Daniel are spoken with the word chapter between them.

Question

Why does this happen, and which criteria determines which names are affected?

2

There are 2 answers

2
Kaiido On BEST ANSWER

I think that one of your homonyms wrote something in a famous book, and a certain John did too. I would guess that it does the same for Jeremiah or these others.

But I can't repro neither on my 55.0.2883.95 nor on my 57.0.2954.0 on mac...

Maybe it was a Christmas Easter egg.

so many religious parts in this answer...

0
russa On

I do not think that this is an "Easter Egg":
Generally, speech synthesis engines often try to interpret text fragments in some meaningful way, e.g. reading numbers with punctuation as dates, if that fragment "looks" like a date.

Explanation

With Google, I would guess that they heavily rely on statistics for deciding, if a text fragment should be interpreted one way or the other.

In practice this may fail for specific cases, for several reasons: e.g. if the specific case really is some specialized case; if there is not enough "context" to derive the correct/intended meaning (this may be especially true for very short sentences/fragments); if the (text) corpus for deriving the statistics is not balanced w.r.t. common use...

Suggestion

Depending on the engine, the behavior often can be somewhat controlled, by formatting the input text differently.

E.g. testing your code snippet in Chrome: if you write out the number ("four" instead of 4), or insert a comma after the name (i.e.name + ', 4'), then the speech engine will not insert "chapter" (note that comma also introduces a short pause).