Google Speech-To-Text Cannnot Process AMR audio?

22 views Asked by At

I have a java app which is sending call recording to my node server, where I'm transcribing it using Google's Speech-To-Text tool. Here's what a part of it looks like

const speechClient = new speech.SpeechClient();

  const file = req.files[0].buffer;          
  const audioBytes = file.toString('base64');
  
  const audio = {
    content: audioBytes
  };
  const config = {
    encoding: 'AMR_WB',   
    sampleRateHertz: 16000, 
    languageCode: 'bn-BD',  
  };
  const data = await speechClient.recognize({audio, config})
  const transcription = data[0].results.map(r => r.alternatives[0].transcript).join("\n");
  console.log(transcription);

Since I'm using a lossy format, it should work as per the official docs here. But I just get an empty string. Any other sampleRateHertz throws an error of bad sampleRate.

Tried this combination as well, which also returns empty string

const config = {
    encoding: 'LINEAR16',   // Audio encoding (change if needed). FLAC/LINEAR16/AMR_WB
    sampleRateHertz: 44000, // Audio sample rate in Hertz (change if needed).
    languageCode: 'bn-BD',   // Language code for the audio (change if needed).
  };

Please help. Basically recording calls from an old samsung running android 6 and transcribing for sentiment analysis. For a school project.

Thanks

0

There are 0 answers