I am working on text classification using Longformer Model. I took even just first 100 rows of dataframe. I am getting memory error. I am using google colab.
This is my model :
model = LongformerForMultiSequenceClassification.from_pretrained('allenai/longformer-base-4096',
# Accessing the model configuration
configuration = model.config
Training Loop
for epoch in tqdm(range(1, epochs+1)):
loss_train_total = 0
progress_bar = tqdm(dataloader_train, desc='Epoch {:1d}'.format(epoch), leave=False, disable=False)
for batch in progress_bar:
#this will empty the gradients from the previous iterations
#take out inputs
batch = tuple(b.to(device) for b in batch)
inputs = {'input_ids': batch[0],
'attention_mask': batch[1],
'labels': batch[2],
#insert the input into the model and get the result
outputs = model(**inputs)
#calculate loss
loss = outputs[0]
loss_train_total += loss.item()
#this will calculate the gradients
# for preventening gradient explosion
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
#this will update the weights
#optimizing learning rate
progress_bar.set_postfix({'training_loss': '{:.3f}'.format(loss.item()/len(batch))})
torch.save(model.state_dict(), f'/content/Gdrive/My Drive/finetuned_longformer_epoch_{epoch}.model')
#torch.save(model.state_dict(), f'checkpoint{epoch}.pth')
tqdm.write(f'\nEpoch {epoch}')
loss_train_avg = loss_train_total/len(dataloader_train)
tqdm.write(f'Training loss: {loss_train_avg}')
val_loss, predictions, true_vals = evaluate(dataloader_validation)
val_f1 = f1_score_func(predictions, true_vals)
tqdm.write(f'Validation loss: {val_loss}')
tqdm.write(f'F1 Score (Weighted): {val_f1}')
Error :
RuntimeError Traceback (most recent call last)
<ipython-input-32-7e534d564c0a> in <module>()
20 }
21 #insert the input into the model and get the result
---> 22 outputs = model(**inputs)
24 #calculate loss
12 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in dropout(input, p, training, inplace)
971 return (_VF.dropout_(input, p, training)
972 if inplace
--> 973 else _VF.dropout(input, p, training))
RuntimeError: CUDA out of memory. Tried to allocate 182.00 MiB (GPU 0; 11.17 GiB total capacity; 10.23 GiB already allocated; 59.81 MiB free; 10.69 GiB reserved in total by PyTorch)
You can check my config file and model structure and custom class for Global Attention or My complete code is on colab is here :
I ran a similar code using BERT and it works without any problem.
I am new to datascience. Please help.
There are few checkpoints to solve this error. Call
clears old gradients from the last step but only if all your model parameter are in the same optimizer.Edit: Longformer git repo has somewhat similar issue at https://github.com/allenai/longformer/issues/41. This might be useful if you are using a similar configuration.
Also some idea on gradient checkpoint.