Does Azure's Speech to Text service accept Webm audio and does it offer an output with timestamps?

717 views Asked by At

I'm trying to decide whether Azure is the best platform for my transcription needs.

I have two questions -- does Azure's Speech to Text service:

  1. Accept Webm audio as input?
  2. Does it offer an output with timestamps?
2

There are 2 answers

0
Edward Gahan On

MS Cognitive Services Speech to text only takes WAV or OGG audio files as far as I know and I don't think it handles containers like Webm, MKV etc.

We're a new transcription startup called 3Scribe (we think the most accurate on the market) and can handle Webm containers as input. We have timestamps on our JSON output and are about to launch our custom outputs so if you're looking for something specific then drop us a line on our support email. If you want to sign up and let us know quoting this thread I can add some extra credit on to your account for you.

0
AmitShukla On

Microsoft speech SDK also supports webm container. Please follow the following sample. Please change the file name with the webm file and format to AudioStreamContainerFormat.ANY. You also need to install gstreamer on your client machine.

https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/770e1094a94ab67febeb737f2a4fb75c591b8231/samples/csharp/sharedcontent/console/speech_recognition_samples.cs#L248

By the way which platform and which language you are using ?

We also support timestamps with the actual offset for the transcriptions. The json output will be like {"Id":"1384bb2080b54ce6bec99e3342092610","RecognitionStatus":"Success","DisplayText":"What brings you to the land of the gatekeepers?","Offset":120100000,"Duration":24700000}

where you can see the offset from the beginning of the stream and duration for the whole text.