Microsoft Translator Speech missing punctuation

383 views Asked by At

I am using MS Translator Speech WebSocket API for real-time speech recognition and translation. The problem is that sometimes the recognised text does not have punctuation (commas, full stops, etc.). The transcribed text looks good otherwise. I also receive an MP3 with synthesised translation.

It looks completely random, I can send the same audio multiple times and some responses have punctuation and some do not. I am sending the audio in correct format and in near real-time rate e.g. I send 100ms samples every ~100ms. The recognised language is Spanish.

Is this a common issue or is there some other catch?

2

There are 2 answers

0
shelll On BEST ANSWER

Switching to the Speech Preview API solved the missing punctuation. For now there are SDK's only and the raw WebSocket API is not yet documented. I have managed to connect to and use the WS API, more info in another SO question.

1
Chris Wendt On

There are different response types for partial recognitions and the final recognition. You receive partial recognitions as the speech continues to come in, and one final one at the end of the utterance. The partial results may be missing punctuation and casing, the final one will have casing and punctuation. If you want to ignore the responses without casing and punctuation, you want to filter to only see the final responses.