ibm-cloud speech-to-text: Is it possible to specify phonemes for custom vocabulary?

133 views Asked by At

We need to build a custom model with a lot of already phonemically transcribed custom vocabulary, but the current API for specifying custom words has no published option for specifying a phonemic string rather than a manually generated, ad-hoc "sounds_like" orthographic string. Since we have not been able to find any reliable tools for generating equivalent "sounds like" strings by rule from a phoneme string, this is a real barrier to us being able to use the IBM speech-to-text engine successfully.

Is there is an accepted phonetic/phonemic alphabet and available API mechanism for specifying a phoneme string rather than another orthography to indicate what custom words sound like when adding them to a custom model via the IBM cloud speech-to-text API? (i.e. an analog to the IPA and mechanisms for using it in IBMs text-to-speech API?)

(Alternatively, does IBM or anyone out there have a good tool for converting a phoneme sequence into an orthography guaranteed to be reconverted back to the same phoneme string by their ASR engine?)

1

There are 1 answers

0
W. Sadkin On

Through tech support, I found out that there is currently a "dark/undocumented" feature in the API, through which one can specify phoneme strings in a "sounds_like" specification by enclosing the phonetic string using the following format: "".

For example, here is a cURL example adding the pronunciation 'hɑː.lə' for the word 'challah':

curl -u $CREDS  -X PUT --header "Content-Type:application/json"  --data "{\"sounds_like\":[\"<phoneme hɑː.lə>\"]}" https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/$custID/words/challah

Such a format can also be used when building CustomWord objects, and submitting them through the API.

The acceptable range of IPA symbols appears to be the same as that for their text-to-speech API, and can be found here: https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-usSymbols