Features have excessive nesting error when trying to use my own vocab_file

70 views Asked by Aamod Varma At 22 November 2023 at 08:58

I'm trying to use my own vocab_file with GPT2Tokenizer but I'm facing issues when I'm trying to use certain tokens.

tokenizer = GPT2Tokenizer.from_pretrained('gpt2', vocab_file="./vocab.json")
encoding = tokenizer("Pa Pa Cl Cl Cl", return_tensors="pt", padding=True, truncation=True)

In the above case it works as expected but say I change the string to "Pa Pa Cl Cl Cl Nb" I get an error as follows,

ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`input_ids` in this case) have excessive nesting (inputs type `list` where type `int` is expected).

My vocab_file is here

Original Q&A

TechQA.

Features have excessive nesting error when trying to use my own vocab_file

There are 0 answers

Related Questions in NLP

Related Questions in HUGGINGFACE-TOKENIZERS

Related Questions in GPT-2

Popular Questions

Popular Tags

Trending Questions