KeyError while using Trainer.predict() with Huggingface

1.5k views Asked by At

I have fine-tuned a model for sentiment analysis using BertForSequenceClassification. I'm trying to run a prediction on example sentiment (to find out whether it is negative or positive). Here is my code:

model_path = "./path_to_directory_with_config.json"
tokenizer = transformers.BertTokenizer.from_pretrained('TurkuNLP/bert-base-finnish-cased-v1')
txt = "This was a nice place"
inputs = tokenizer(txt, return_tensors="pt")
print(inputs)
model = transformers.BertForSequenceClassification.from_pretrained(model_path, num_labels=1)
trainer = Trainer(model)
print(trainer.predict(inputs))

This throws an error:

KeyError: 'Indexing with integers (to access backend Encoding for a given batch index) is not available when using Python based tokenizers'

The model was trained (fine-tuned) with dataset like this:

Text                       Sentiment
This was nice place            1
This was bad place             0  

What I'm doing wrong? Any advise is highly appreciated.

Edit: Here is the full stack trace:

Traceback (most recent call last):
  File "C:\Users\Software Engineer\Desktop\Projekti\sentiment-analysis\data-raw\bin\inference.py", line 21, in <module>
    print(trainer.predict(inputs))
  File "C:\Users\Software Engineer\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\transformers\trainer.py", line 2185, in predict
    output = eval_loop(
  File "C:\Users\Software Engineer\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\transformers\trainer.py", line 2275, in evaluation_loop
    for step, inputs in enumerate(dataloader):
  File "C:\Users\Software Engineer\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\utils\data\dataloader.py", line 521, in __next__
    data = self._next_data()
  File "C:\Users\Software Engineer\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\utils\data\dataloader.py", line 561, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "C:\Users\Software Engineer\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\utils\data\_utils\fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "C:\Users\Software Engineer\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\utils\data\_utils\fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "C:\Users\Software Engineer\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\transformers\tokenization_utils_base.py", line 241, in __getitem__
    raise KeyError(
KeyError: 'Indexing with integers (to access backend Encoding for a given batch index) is not available when using Python based tokenizers'
0

There are 0 answers