speech_recognition Throws error ``audio_data`` must be audio data

Question

speech_recognition Throws error ``audio_data`` must be audio data

120 views Asked by Mr Learner At 07 November 2023 at 14:45

I have a video file and I want to determine the confidence level of the speaker. To perform that at first I am extracting the audio from the video file. Below is the code for that.

from moviepy.editor import *    
local_video_path = "Video.mp4"
sound = AudioFileClip(local_video_path)
sound.write_audiofile("sound.wav", 44100, 2, 2000,"pcm_s32le")

Above code runs successfully and creates an audio file.

In next step I am using speech_recognition library to determine confidence level of the speaker. I am using following code

import speech_recognition as sr

recognizer = sr.Recognizer()

audio = sr.AudioFile('sound.wav')
#text = recognizer.recognize_audio(audio)
text = recognizer.recognize_google(audio)

confidence = recognizer.confidence()

print(confidence)

But I am getting the error:

AssertionError: ``audio_data`` must be audio data

I even tried different approaches, like the following code from this URL.

import speech_recognition as sr

r = sr.Recognizer()
file = sr.AudioFile('sound.wav')
with file as source:
    audio_file = r.record(source,duration=20)
print(r.recognize_google(audio_file))

But above code is giving ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format.

How to resolve this issue?

Original Q&A

There are 1 answers

**MoRoBe** · Answer 1 · 2023-11-07T15:24:58+00:00

I'd suggest trying to use ffmpeg for extraction of the audio from the video as described here. This allows to easily try different formats to maybe find a working one. For me 16 bit mono wav extracted using ffmpeg -i 'Video.mp4' -map 0:a -acodec pcm_s16le -ar 22050 -ac 1 audio.wav worked.

As a side note, you state you want to "determine the confidence level of the speaker". Afaik the returned confidence symbolizes how likely the transcription is correct, not how confident the speaker is.

TechQA.

speech_recognition Throws error ``audio_data`` must be audio data

There are 1 answers

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in SPEECH-RECOGNITION

Related Questions in GOOGLE-SPEECH-API

Related Questions in SPEECH-RECOGNITION-API

Popular Questions

Trending Questions