Google-Cloud-Speech: Latency in first interim result for StreamingRecognize method

502 views Asked by At

We are trying to use Google StreamingRecognize method via JAVA client. We are reading data from microphone and sending it to the speech API. Following settings are used: Recognition Config - LINEAR16, 16KHz, en-US We tried pushing different buffer size to StreamingRecognize (upto 16000 bytes). We observe that it takes a minimum of 4-5 seconds to get the first result and there after interim results are streamed. Can anybody confirm if this is the intended behavior of the API. Also it would be nice to know why so much latency is there. Is there any method or work around to minimize the latency.

Please note that after the latency we get the interim result and finally the full utterance with reasonable accuracy

1

There are 1 answers

0
user3776111 On

I suspect 2 behaviors are wrong in context of description,

  1. Sample rate should not be hardcoded or fixed constant in your java service app, because samplerate will be varied for every system or microphone adapters installed in respective system. i.e, 8000, 16000, 41000, 48000 , etc. so you need to pick sample rate from audio context of your microphone and send it in 1st initial call to update in Requestconfig setter.

  2. If you are streaming through websocket at the time of connection handshake send those samplerate, bytes/frame to 1st request observer and from 2nd request onwards you need to skip 1st request observers and can pass directly to 2nd request observers to get transcript.

If above points doesn't work share your StreamingRecognize class. So i can tune your code accordingly