replace whisper tokenizer with BERT tokenizer

53 views Asked by afsara_ben At 06 March 2024 at 17:20

I want to add a BERT NER module atop openai-whisper. To train it I feed the BERT model tokens (and not text) from whisper decoder. The output should be the entity tokens in a one hot encoding. Now the issue is BERT and openai-whisper tokenizers are different. So when I feed the tokens into BERT NER, it means something different from what it meant originally. Can this be done or is it not possible as BERT is a causal (text to text) LM?

train input: [50364, 286, 362, 257, 3440, 4153, 2446, 412, 1266, 335, 13, 2555, 4160, 385, 13, 50614] train label [0. 0. 0. 0. 0. 1. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0.]

Original Q&A

TechQA.

replace whisper tokenizer with BERT tokenizer

There are 0 answers

Related Questions in BERT-LANGUAGE-MODEL

Related Questions in HUGGINGFACE-TOKENIZERS

Related Questions in OPENAI-WHISPER

Popular Questions

Trending Questions