Is it possible to get a best-guess for the entire input-audio from Google Cloud Speech?

73 views Asked by Morten Højfeldt Rasmussen At 04 September 2017 at 12:50

We're running into an issue trying to use google cloud speech (GCS) for audio-indexing purposes. We've tried two different setups:

A single audio-file containing multiple speakers (high SNR, only speech + silence) is sent to GCS.
The audio-file is split into separate speakers, the segments concatenated, and one audio-file per speaker is sent to GCS.

The problem is that large parts (~22%) of the speech doesn't get any output hypotheses regardless of setup (1 or 2 above).

The documentation states that "If the Speech API determines that an alternative has a sufficient Confidence Value, then that alternative is included in the response." Is this also true for the best hypothesis (that it's only included if the confidence is high enough) – and is that why parts of the speech is missing?

And the actual question as per the title: Is it possible to get a best-guess for the entire input-audio from Google Cloud Speech?

Original Q&A

TechQA.

Is it possible to get a best-guess for the entire input-audio from Google Cloud Speech?

There are 0 answers

Related Questions in GOOGLE-CLOUD-SPEECH

Popular Questions

Popular Tags

Trending Questions