I have a video file and I want to determine the confidence level of the speaker. To perform that at first I am extracting the audio from the video file. Below is the code for that.
from moviepy.editor import *
local_video_path = "Video.mp4"
sound = AudioFileClip(local_video_path)
sound.write_audiofile("sound.wav", 44100, 2, 2000,"pcm_s32le")
Above code runs successfully and creates an audio file.
In next step I am using speech_recognition library to determine confidence level of the speaker. I am using following code
import speech_recognition as sr
recognizer = sr.Recognizer()
audio = sr.AudioFile('sound.wav')
#text = recognizer.recognize_audio(audio)
text = recognizer.recognize_google(audio)
confidence = recognizer.confidence()
print(confidence)
But I am getting the error:
AssertionError: ``audio_data`` must be audio data
I even tried different approaches, like the following code from this URL.
import speech_recognition as sr
r = sr.Recognizer()
file = sr.AudioFile('sound.wav')
with file as source:
audio_file = r.record(source,duration=20)
print(r.recognize_google(audio_file))
But above code is giving ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format.
How to resolve this issue?
I'd suggest trying to use ffmpeg for extraction of the audio from the video as described here. This allows to easily try different formats to maybe find a working one. For me 16 bit mono wav extracted using
ffmpeg -i 'Video.mp4' -map 0:a -acodec pcm_s16le -ar 22050 -ac 1 audio.wavworked.As a side note, you state you want to "determine the confidence level of the speaker". Afaik the returned confidence symbolizes how likely the transcription is correct, not how confident the speaker is.