I am using MS Translator Speech WebSocket API for real-time speech recognition and translation. The problem is that sometimes the recognised text does not have punctuation (commas, full stops, etc.). The transcribed text looks good otherwise. I also receive an MP3 with synthesised translation.
It looks completely random, I can send the same audio multiple times and some responses have punctuation and some do not. I am sending the audio in correct format and in near real-time rate e.g. I send 100ms samples every ~100ms. The recognised language is Spanish.
Is this a common issue or is there some other catch?
Switching to the Speech Preview API solved the missing punctuation. For now there are SDK's only and the raw WebSocket API is not yet documented. I have managed to connect to and use the WS API, more info in another SO question.