Regarding Whisper AI model transcribing the text into Spanish even if the audio is in English language

126 views Asked by At

I have an audio file that gets passed into the WhisperAI model and transcription happens. Simple right? But some of the audio files which are in English language which get passed through this and the transcribed text comes out in Spanish language. Below is my code snippet which transcribes audio into text.

path = path/to/file  
model_size = 'large'
model = whisper.load_model(model_size)
result = model.transcribe(path)
result_text = result['text']

So basically I have written a bash script that will give the audio files one by one to the Python script which consists of the model and thus it will transcribe the audio to text, save it inside the text file, and upload the text file to the S3 bucket for further use. But here even though the audio is in English language, for some files the whole text gets converted from English to Spanish.

It will be very helpful if someone gives me input on this.

Thank you

0

There are 0 answers