Google Cloud Speech API returns nothing for audio longer than 1 minute

1.8k views Asked by At

Audio files shorter than 1 minute are transcribed without problem, but when I attempt to transcribe a longer file, the Google Speech API returns an empty response.

I make my .wav file using the following SoX command:

sox input.flac --channels=1 --bits=16 --rate=16000 --encoding=signed-integer --endian=little output.wav

The file plays as expected. Running SoXi, I get the following information:

Input File     : 'output.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:02:35.71 = 2491408 samples ~ 11678.5 CDDA sectors
File Size      : 4.98M
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

I then upload it to my Google Storage, because the documentation states that any file larger than 1 minute must reside in a gs bucket for the API to transcribe it.

I then run the following piece of code to begin the transcribing operation:

use \Google\Cloud\ServiceBuilder;

$cloud = new ServiceBuilder([
    'keyFilePath' => '/var/www/cert/gcloud_key.json',
    'projectId' => 'm****n-141000'
]);

$speech = $cloud->speech();

$operation = $speech->beginRecognizeOperation(
    "gs://m****n-141000.appspot.com/output.wav", [
    'encoding' => 'LINEAR16',
    'sampleRate' => 16000
]);

$isComplete = $operation->isComplete();

while (!$isComplete) {
    sleep(1);
    $operation->reload();
    $isComplete = $operation->isComplete();
}

var_dump($operation->results());

The response coming back is empty. The full response looks like this:

object(stdClass)#27 (4) {
  ["name"]=>
  string(19) "1904326252537199795"
  ["metadata"]=>
  object(stdClass)#24 (4) {
    ["@type"]=>
    string(70) "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeMetadata"
    ["progressPercent"]=>
    int(100)
    ["startTime"]=>
    string(27) "2017-01-02T09:36:45.780425Z"
    ["lastUpdateTime"]=>
    string(27) "2017-01-02T09:36:46.720260Z"
  }
  ["done"]=>
  bool(true)
  ["response"]=>
  object(stdClass)#26 (1) {
    ["@type"]=>
    string(70) "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse"
  }
}

Suggesting that the request ran and completed successfully, but without any actual response. Where am I going wrong?

3

There are 3 answers

0
Aakash282 On

See the documentation here: https://cloud.google.com/speech/docs/basics

Note that no results are yet present. The Speech API will continue to process the supplied audio and use this operation to store eventual results, which will appear within the operation"s response field (of type AsyncRecognizeResponse) upon completion of the request.

I assume there is a way to provide a callback function that will handle the actual response that includes transcription.

0
Harri H. On

Speech API documentation (https://cloud.google.com/speech/docs/encoding) is saying that wav files are not supported. It should be raw file without any headers (with *.raw extension). The sox conversion should have "--type=FILETYPE" definition, but unfortunately I'm not sure if it is "--type=raw" or something else.

0
mickeywilko On

You must pass Google Cloud Storage objects.

So try:

use \Google\Cloud\ServiceBuilder;

$cloud = new ServiceBuilder([
    'keyFilePath' => '/var/www/cert/gcloud_key.json',
    'projectId' => 'm****n-141000'
]);

$storage = $cloud->storage();
$bucket = $storage->bucket($bucket_name);
$object = $bucket->object($audio_filename);

$speech = $cloud->speech();

$operation = $speech->beginRecognizeOperation(
    $object, [
    'encoding' => 'LINEAR16',
    'sampleRate' => 16000
]);

$isComplete = $operation->isComplete();

while (!$isComplete) {
    sleep(1);
    $operation->reload();
    $isComplete = $operation->isComplete();
}

var_dump($operation->results());