I have fine-tuned a model for sentiment analysis using BertForSequenceClassification. I'm trying to run a prediction on example sentiment (to find out whether it is negative or positive). Here is my code:
model_path = "./path_to_directory_with_config.json"
tokenizer = transformers.BertTokenizer.from_pretrained('TurkuNLP/bert-base-finnish-cased-v1')
txt = "This was a nice place"
inputs = tokenizer(txt, return_tensors="pt")
print(inputs)
model = transformers.BertForSequenceClassification.from_pretrained(model_path, num_labels=1)
trainer = Trainer(model)
print(trainer.predict(inputs))
This throws an error:
KeyError: 'Indexing with integers (to access backend Encoding for a given batch index) is not available when using Python based tokenizers'
The model was trained (fine-tuned) with dataset like this:
Text Sentiment
This was nice place 1
This was bad place 0
What I'm doing wrong? Any advise is highly appreciated.
Edit: Here is the full stack trace:
Traceback (most recent call last):
File "C:\Users\Software Engineer\Desktop\Projekti\sentiment-analysis\data-raw\bin\inference.py", line 21, in <module>
print(trainer.predict(inputs))
File "C:\Users\Software Engineer\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\transformers\trainer.py", line 2185, in predict
output = eval_loop(
File "C:\Users\Software Engineer\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\transformers\trainer.py", line 2275, in evaluation_loop
for step, inputs in enumerate(dataloader):
File "C:\Users\Software Engineer\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\utils\data\dataloader.py", line 521, in __next__
data = self._next_data()
File "C:\Users\Software Engineer\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\utils\data\dataloader.py", line 561, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "C:\Users\Software Engineer\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\utils\data\_utils\fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\Software Engineer\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\utils\data\_utils\fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\Software Engineer\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\transformers\tokenization_utils_base.py", line 241, in __getitem__
raise KeyError(
KeyError: 'Indexing with integers (to access backend Encoding for a given batch index) is not available when using Python based tokenizers'