Low accuracy of Watson Speech-to-text with custom model

688 views Asked by At

Watson conversation service did not recognize my accent.Therefore I used a custom model and here is the results for before and after using the custom model.

Test Results

Before integrating the model :- When you have a motto that they have in the. Sheila. Jabba among the. The woman. The.

After integrating the model :- We give Omatta David. Sri Lanka. Jabba among the. Number. Gov.

Actual audio- Audio 49,Wijayaba Mawatha,Kalubowila,Dehiwela,Sri Lanka.Government.Gov.

How I included the custom model- I used the same file given in the demo forked from github In the socket.js I included the customization id as shown in the picture.There where other ways of including the custom model (ways to integrate custom model) but I would like to know if the method I have done is correct?

Here is the python code I used to create the custom model. code link

Here is the corpus result I after executing the python code in JSON format.corpus file

Here is the custom model(custom model text file which was included in the code) where I have included all the Sri Lankan roads.

I forked the file and edited the socket.js as follows.

2

There are 2 answers

0
Nathan Friedly On BEST ANSWER

First, unless I'm missing something, several of the words you said don't actually appear in the corpus1.txt file. Obviously the service needs to know of words that you expect it to transcribe.

Next, the service is geared towards more common speech patterns. A list of arbitrary names is difficult because it can't guess a word based on it's context. This is normally what the custom corpus provides, but that doesn't work in this case (unless you happen to read the names in the exact order they appear in the corpus - and even then, they only appear once and without any context that the service would already recognize.)

To compensate for this, in addition to the corpus of custom words, you may need to provide a sounds_like for many of them to indicate pronunciation: http://www.ibm.com/watson/developercloud/doc/speech-to-text/custom.shtml#addWords

This is quite a bit more work (it must be done for each word that the service doesn't recognize correctly), but should improve your results.

Third, the audio file you provided has a fair amount of background noise which will degrade your results. A better microphone/recording location/etc. will help.

Finally, speaking more clearly, with precise dictation and as close to a "standard" US English accent as you can muster should also help improve the results.

0
Tony Lee On

The main problem I see is that the audio is very noisy (I hear train tracks in the background). The second issue is that the OOV words extracted from the corpus should be checked for their pronunciation accuracy. The third issue could be the accent problem of the speaker (I assume that you are using the US English model) and that it has a problem with accented English. As far as the custom model training data, you can try repeating some of the words in your training data (to give more weight to the new words).

Tony Lee IBM Speech team