IBM Watson Speech-to-Text "unable to transcode data stream audio/webm -> audio/x-float-array" media MIME types

652 views Asked by At

I'm recording short audio files (a few seconds) in Chrome using mediaDevices.getUserMedia(), saving the file to Firebase Storage, and then trying to send the files to IBM Watson Speech-to-Text. I'm getting back this error message:

unable to transcode data stream audio/webm -> audio/x-float-array

In the browser I set up the microphone:

navigator.mediaDevices.getUserMedia({ audio: true, video: false })
.then(stream => {

var options = {
   audioBitsPerSecond : 128000,
   mimeType : 'audio/webm'
};

const mediaRecorder = new MediaRecorder(stream, options);
mediaRecorder.start();
...

According to this answer Chrome only supports two media types

audio/webm
audio/webm;codecs=opus

I tried both.

Here's what I sent to IBM Watson:

curl -X POST -u "apikey:my-api-key" \
--header "Content-Type: audio/webm" \
--data-binary "https://firebasestorage.googleapis.com/v0/b/my-app.appspot.com/my-file" \
--url "https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/01010101/v1/recognize"

The list of supported MIME types includes webm and webm;codecs=opus.

I tried recording and sending a ogg format file, and got the same error message:

curl -X POST -u "apikey:my-api-key" \
--header "Content-Type: audio/ogg" \
--data-binary @/Users/TDK/LanguageTwo/public/1.ogg \
--url "https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/01010101/v1/recognize"

I tried IBM's sample audio file and it worked perfectly:

"transcript": "several tornadoes touched down as a line of severe thunderstorms swept through Colorado on Sunday "

I'm getting a similar error message from Google Cloud Speech-to-Text.

1

There are 1 answers

0
Phil Ricketts On

Create a bash script called watsonstt.sh (I recommend saving in ~/bin/), paste in contents below, replace apikey, url, and savepath variable content with your own and call the script as the comment recommends, including quotes for the single argument (to deal with spaces).

API credentials are supplied within the 'Manage' tab of the IBM Watson cloud web interface at time of writing and you need to sign up with credit/debit card details.


#!/bin/bash

# call this script with one argument for posix file path parameter in quotes e.g.: 
# watsonstt.sh "/user/name/file.mp3"

# 500 mins per month for free
# https://www.ibm.com/watson/developercloud/speech-to-text/api/v1/curl.html?curl#get-token

apikey=XXXXXXXXXXXX
url=YYYYYYYYYY
savepath=~/Desktop/${1##*/}.txt

curl -X POST -u "apikey:$apikey" --header "Content-Type: audio/${1##*.}" --data-binary @"$1" "$url/v1/recognize?timestamps=true&max_alternatives=3" -o "${savepath}"