On Whisper API, when I try to use a python script for transcribing audio files in bulk, I can't get the correct response_format ('srt' or 'vtt') work

1.4k views Asked by At

I'm using this code for connecting to Whisper API and transcribe in bulk all mp3 in a folder to both srt and vtt:

import requests
import os
import openai

folder_path = "/content/audios/"
def transcribe_and_save(file_path, format):
    url = 'https://api.openai.com/v1/audio/transcriptions'
    headers = {'Authorization': 'Bearer MyToken'}
    files = {'file': open(file_path, 'rb'), 
            'model': (None, 'whisper-1'),
            'response_format': format}
    response = requests.post(url, headers=headers, files=files)
    output_path = os.path.join(folder_path, os.path.splitext(filename)[0] + '.' + format)
    with open(output_path, 'w') as f:
        f.write(response.content.decode('utf-8'))

for filename in os.listdir(folder_path):
    if filename.endswith('.mp3'):
        file_path = os.path.join(folder_path, filename)
        transcribe_and_save(file_path, 'srt')
        transcribe_and_save(file_path, 'vtt')
else:
    print('mp3s not found in folder')

When I use this code, I'm getting the following error:

"error": {
    "message": "1 validation error for Request\nbody -> response_format\n  value is not a valid enumeration member; permitted: 'json', 'text', 'vtt', 'srt', 'verbose_json' (type=type_error.enum; enum_values=[<ResponseFormat.JSON: 'json'>, <ResponseFormat.TEXT: 'text'>, <ResponseFormat.VTT: 'vtt'>, <ResponseFormat.SRT: 'srt'>, <ResponseFormat.VERBOSE_JSON: 'verbose_json'>])",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }

I've tried with different values, but either don't work or I'm only receiving the transcription as a object in plain text, but no srt or vtt. I'm expecting to get srt and vtt files in the same folder as where audios are

Thanks, Javi

3

There are 3 answers

1
Thomasssb1 On

I am not sure about the whisper api, but you seem to be using an already existing python function as a parameter name. Perhaps this could be a reason why it is not working, as the function format is being used when calling the endpoint instead of the parameter you passed in.

Try changing the parameter name to something other than format and change the value being used for response_format.

2
waghler On

I've found the solution, the problem was in one of the parameters 'response_format': (None, output_format):

def transcribe_and_save(file_path, output_format):
    url = 'https://api.openai.com/v1/audio/transcriptions'
    headers = {'Authorization': 'Bearer myToken'}
    files = {'file': open(file_path, 'rb'),
             'model': (None, 'whisper-1'),
             'response_format': (None, output_format)}
    response = requests.post(url, headers=headers, files=files)
    output_path = os.path.join(folder_path, os.path.splitext(os.path.basename(file_path))[0] + '.' + output_format)
    with open(output_path, 'w') as f:
        f.write(response.content.decode('utf-8'))

for filename in os.listdir(folder_path):
    if filename.endswith('.mp3'):
        file_path = os.path.join(folder_path, filename)
        transcribe_and_save(file_path, 'srt')
        transcribe_and_save(file_path, 'vtt')
else:
    print('mp3s not found in folder')
0
dazzafact On

Here's a working Solution for single files:

import requests
import os

OPENAI_API_KEY = "123xyzxyzxyzxyzxyzxyzxyzxyz"

token = f"Bearer {OPENAI_API_KEY}"

url = "https://api.openai.com/v1/audio/transcriptions"
model_name ="whisper-1"

headers ={
    "Authorization": token,
    "Content-Type": "multipart/form-data"
}

file_path ="1.mp3"
with open(file_path,"rb") as file:
    file_content = file.read()

payload = {
    "name": os.path.basename(file_path),
    "response_format": "json",
    "prompt": "transcribe this Chapter",
    "language": "de",
    "model": model_name
}

files = {
    "file": (os.path.basename(file_path), file_content, "audio/mp3")
}

response = requests.post(url, headers=headers, data=payload, files=files)


print(response.text)