I am working on text classification using Longformer Model. I took even just first 100 rows of dataframe. I am getting memory error. I am using google colab.
This is my model :
model = LongformerForMultiSequenceClassification.from_pretrained('allenai/longformer-base-4096',
config=config)
# Accessing the model configuration
configuration = model.config
Training Loop
for epoch in tqdm(range(1, epochs+1)):
model.train()
loss_train_total = 0
progress_bar = tqdm(dataloader_train, desc='Epoch {:1d}'.format(epoch), leave=False, disable=False)
for batch in progress_bar:
#this will empty the gradients from the previous iterations
model.zero_grad()
#take out inputs
batch = tuple(b.to(device) for b in batch)
inputs = {'input_ids': batch[0],
'attention_mask': batch[1],
'labels': batch[2],
}
#insert the input into the model and get the result
outputs = model(**inputs)
#calculate loss
loss = outputs[0]
loss_train_total += loss.item()
#this will calculate the gradients
loss.backward()
# for preventening gradient explosion
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
#this will update the weights
optimizer.step()
#optimizing learning rate
scheduler.step()
progress_bar.set_postfix({'training_loss': '{:.3f}'.format(loss.item()/len(batch))})
torch.save(model.state_dict(), f'/content/Gdrive/My Drive/finetuned_longformer_epoch_{epoch}.model')
#torch.save(model.state_dict(), f'checkpoint{epoch}.pth')
tqdm.write(f'\nEpoch {epoch}')
loss_train_avg = loss_train_total/len(dataloader_train)
tqdm.write(f'Training loss: {loss_train_avg}')
val_loss, predictions, true_vals = evaluate(dataloader_validation)
val_f1 = f1_score_func(predictions, true_vals)
tqdm.write(f'Validation loss: {val_loss}')
tqdm.write(f'F1 Score (Weighted): {val_f1}')
Error :
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-32-7e534d564c0a> in <module>()
20 }
21 #insert the input into the model and get the result
---> 22 outputs = model(**inputs)
23
24 #calculate loss
12 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in dropout(input, p, training, inplace)
971 return (_VF.dropout_(input, p, training)
972 if inplace
--> 973 else _VF.dropout(input, p, training))
974
975
RuntimeError: CUDA out of memory. Tried to allocate 182.00 MiB (GPU 0; 11.17 GiB total capacity; 10.23 GiB already allocated; 59.81 MiB free; 10.69 GiB reserved in total by PyTorch)
You can check my config file and model structure and custom class for Global Attention or My complete code is on colab is here :
https://colab.research.google.com/drive/19JkCht_4u6UrwcUcWNnSD2YtnsJYer0H?usp=sharing
I ran a similar code using BERT and it works without any problem.
I am new to datascience. Please help.
There are few checkpoints to solve this error. Call
optimizer.zero_grad()
afteroptimizer.step()
.model.zero_grad()
clears old gradients from the last step but only if all your model parameter are in the same optimizer.Edit: Longformer git repo has somewhat similar issue at https://github.com/allenai/longformer/issues/41. This might be useful if you are using a similar configuration.
Also some idea on gradient checkpoint.