Problem with total billed duration while transcribing stream audio with Google Speech-to-Text v2

35 views Asked by At

We are starting to use Google v2 for stream transcription of long audios. The problem we are having is that in every step of the conversation (every single pause) we are receiving a "isFinal=true" result and we are being charged with the sum of the offsets of all this intermediate results.

Example: We have a 50 second conversation. We receive partial streaming results at 20 seconds, at 30 seconds, and at 50 seconds. Instead of being charged 50 seconds, the returned totalBilledAmount field is 100 seconds.

Audio sent is Mono 8Khz

Does anybody have a clue about why is this happening, or how can this be avoided?

Regards

1

There are 1 answers

0
Poala Astrid On

Google's Speech-to-Text API may segment audio streams based on pauses or other audio characteristics. Even if it's a single continuous conversation, the API might treat pauses as endpoints for interim results. This can result in multiple interim results being marked as final, even though the entire conversation hasn't ended.

Review the settings and parameters you're using when making requests to the API. Ensure that you're not inadvertently requesting interim results to be marked as final, which could lead to additional billing. Double-check parameters related to streaming recognition and finalization thresholds.

If possible, consider implementing continuous streaming of audio data instead of sending discrete chunks. Continuous streaming can help maintain context and reduce the likelihood of unnecessary interim results being marked as final. However, this approach may require modifications to your application architecture and handling of audio data.