Attention mask error when fine-tuning Mistral 7B using transformers trainer

Question

Attention mask error when fine-tuning Mistral 7B using transformers trainer

896 views Asked by JJ Fran At 13 December 2023 at 20:50

I'm trying to fine-tune mistralai/Mistral-7B-v0.1 using the following sample notebook

I follow the steps in the notebook, but the training fails with:

***** Running training *****
  Num examples = 344
  Num Epochs = 3
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 2
  Gradient Accumulation steps = 1
  Total optimization steps = 500
  Number of trainable parameters = 21,260,288
  0%|          | 0/500 [00:00<?, ?it/s]You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py", line 293, in forward
    raise ValueError(
ValueError: Attention mask should be of size (2, 1, 512, 1024), but is torch.Size([2, 1, 512, 512])

Any ideas where this issue regarding the extension mask could result from? My tokenized data is exactly of size 512. Why is it expecting size 1024 and these particular 4 dimensions?

Original Q&A

There are 1 answers

**user3747797** · Accepted Answer · 2023-12-13T23:46:10+00:00

user3747797 On 13 December 2023 at 23:46 BEST ANSWER

Experiencing the same issue, downgrading transformers to 4.35.2 instead of latest version 4.36.0 seems to work fine.

TechQA.

Attention mask error when fine-tuning Mistral 7B using transformers trainer

There are 1 answers

Related Questions in PYTHON

Related Questions in HUGGINGFACE-TRANSFORMERS

Related Questions in MISTRAL-7B

Popular Questions

Popular Tags

Trending Questions