I'm recording short audio files (a few seconds) in Chrome using mediaDevices.getUserMedia()
, saving the file to Firebase Storage, and then trying to send the files to IBM Watson Speech-to-Text. I'm getting back this error message:
unable to transcode data stream audio/webm -> audio/x-float-array
In the browser I set up the microphone:
navigator.mediaDevices.getUserMedia({ audio: true, video: false })
.then(stream => {
var options = {
audioBitsPerSecond : 128000,
mimeType : 'audio/webm'
};
const mediaRecorder = new MediaRecorder(stream, options);
mediaRecorder.start();
...
According to this answer Chrome only supports two media types
audio/webm
audio/webm;codecs=opus
I tried both.
Here's what I sent to IBM Watson:
curl -X POST -u "apikey:my-api-key" \
--header "Content-Type: audio/webm" \
--data-binary "https://firebasestorage.googleapis.com/v0/b/my-app.appspot.com/my-file" \
--url "https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/01010101/v1/recognize"
The list of supported MIME types includes webm
and webm;codecs=opus
.
I tried recording and sending a ogg
format file, and got the same error message:
curl -X POST -u "apikey:my-api-key" \
--header "Content-Type: audio/ogg" \
--data-binary @/Users/TDK/LanguageTwo/public/1.ogg \
--url "https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/01010101/v1/recognize"
I tried IBM's sample audio file and it worked perfectly:
"transcript": "several tornadoes touched down as a line of severe thunderstorms swept through Colorado on Sunday "
I'm getting a similar error message from Google Cloud Speech-to-Text.
Create a bash script called
watsonstt.sh
(I recommend saving in~/bin/
), paste in contents below, replaceapikey
,url
, andsavepath
variable content with your own and call the script as the comment recommends, including quotes for the single argument (to deal with spaces).API credentials are supplied within the 'Manage' tab of the IBM Watson cloud web interface at time of writing and you need to sign up with credit/debit card details.