I'm using the Google Speech to Text API to transcribe speech in real-time. I connect via a WebSocket, send a data stream of voice text in real-time and have transcripts/words sent back to me.
It all works fine in English (using languageCode: 'en-GB'), but when I try French it takes ages for it to recognise the end of an utterance and send data back to me. Much of what is said seems to be ignored. Tried with native French speakers also.
Below is the setup code on the Node backend:
const recognitionStream = client.streamingRecognize({
config: {
encoding: 'LINEAR16',
sampleRateHertz: 16000,
languageCode: 'fr-FR',
profanityFilter: false,
enableWordTimeOffsets: true,
},
interimResults: false,
singleUtterance: false
})
.on('error', (err) => console.error(err))
.on('data', (data) => {
const transcript = data?.results[0]?.alternatives[0]?.transcript || ''
if (transcript === '') {
console.error('Reached transcription time limit')
}
if (data.results[0] && data.results[0].isFinal) {
console.info('END OF UTTERANCE')
console.info(transcript)
}
})
As mentioned, the only difference in code between my English and French versions is the languageCode being 'fr-FR' instead of 'en-GB'. I do occasionally get a transcript through, but it is much less often and is very delayed compared with when the actual utterance was said. It is, however, in french as I would expect.
Any thoughts on why it is so slow? It's not fit for use at the current rate.