I am trying to use MBart50TokenizerFast with facebook/mbart-large-50-many-to-one-mmt on GPU, and trying to provide multiple sentences in one go (the sentences cannot be combined). Here is my code (based on https://stackoverflow.com/a/62688252/194742):
tokenizer.src_lang = source_lang
inputs = tokenizer([title, ftext], return_tensors="pt").to(device)
outputs = model.generate(**inputs).to(device)
translations = tokenizer.batch_decode(outputs, skip_special_tokens=True)
translated_title = translations[0]
translated_ftext = translations[1]
This mostly follows the example given on the page, except that I am trying to include multiple sentences in one go. Here is the error message I get:
Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`input_ids` in this case) have excessive nesting (inputs type `list` where type `int` is expected).
The code does work with this line:
inputs = self.tokenizer(title, return_tensors="pt").to(self.device)
What is the correct way to use multiple sentences? Thanks for any pointers.
Looks like I had to enable truncation as suggested in the error message. The final code that works: