Is it possible to get a best-guess for the entire input-audio from Google Cloud Speech?

71 views Asked by At

We're running into an issue trying to use google cloud speech (GCS) for audio-indexing purposes. We've tried two different setups:

  1. A single audio-file containing multiple speakers (high SNR, only speech + silence) is sent to GCS.
  2. The audio-file is split into separate speakers, the segments concatenated, and one audio-file per speaker is sent to GCS.

The problem is that large parts (~22%) of the speech doesn't get any output hypotheses regardless of setup (1 or 2 above).

The documentation states that "If the Speech API determines that an alternative has a sufficient Confidence Value, then that alternative is included in the response." Is this also true for the best hypothesis (that it's only included if the confidence is high enough) – and is that why parts of the speech is missing?

And the actual question as per the title: Is it possible to get a best-guess for the entire input-audio from Google Cloud Speech?

0

There are 0 answers