Can't load Tokenizer using hugging-face whisper and gradio

24 views Asked by Arpan Jain At 14 March 2024 at 13:47

raise EnvironmentError(
OSError: Can't load tokenizer for 'myusername/whisper-tiny-hi'. If you were trying to load it 
from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, 
make sure 'myusername/whisper-tiny-hi' is the correct path to a directory containing all relevant files for a 
WhisperTokenizerFast tokenizer.

from transformers import pipeline
import gradio as gr
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
from transformers import pipeline

QAtokenizer = AutoTokenizer.from_pretrained("SRDdev/QABERT-small")
QAmodel = AutoModelForQuestionAnswering.from_pretrained("SRDdev/QABERT-small")

text = '''Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
question-answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.'''
question = "What is extractive question answering?"

def question_answering(text,question):
    ask = pipeline("question-answering", model= QAmodel , tokenizer = QAtokenizer)
    result = ask(question=question, context=text)
    print(f"Answer: '{result['answer']}'")
    return result['answer']

tmodel = M2M100ForConditionalGeneration.from_pretrained("facebook/m2m100_418M")
ttokenizer = M2M100Tokenizer.from_pretrained("facebook/m2m100_418M")

pipe = pipeline(model="myusername/whisper-tiny-hi")  

def translate(text):
    # translate Hindi to English
    ttokenizer.src_lang = "hi"
    encoded_hi = ttokenizer(text, return_tensors="pt")
    generated_tokens = tmodel.generate(**encoded_hi, forced_bos_token_id=ttokenizer.get_lang_id("en"))
    return ttokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
    
def transcribe(audio):
    print(audio)
    text = pipe(audio)["text"]
    translated_text = translate(text)
    return translated_text

def ques_ans(audio,question):
    translated_text = transcribe(audio)
    answer = question_answering(translated_text[0],question)
    # print(translated_text)
    return answer

I tried multiple times after deleting the cache files of hubbing-face too. Please help me solve it

Original Q&A

TechQA.

Can't load Tokenizer using hugging-face whisper and gradio

There are 0 answers

Related Questions in MACHINE-LEARNING

Related Questions in DEEP-LEARNING

Related Questions in HUGGINGFACE-TOKENIZERS

Related Questions in OPENAI-WHISPER

Popular Questions

Trending Questions