Getting Cuda Out of Memory while running Longformer Model in Google Colab. Similar code using Bert is working fine

3k views Asked by At

I am working on text classification using Longformer Model. I took even just first 100 rows of dataframe. I am getting memory error. I am using google colab.

This is my model :

model = LongformerForMultiSequenceClassification.from_pretrained('allenai/longformer-base-4096',
                                        config=config)
# Accessing the model configuration
configuration = model.config

Configuration model image

Custom class for Global attention image

Training Loop

   
for epoch in tqdm(range(1, epochs+1)):
    
    model.train()
    
    loss_train_total = 0

    progress_bar = tqdm(dataloader_train, desc='Epoch {:1d}'.format(epoch), leave=False, disable=False)
    for batch in progress_bar:
        
        #this will empty the gradients from the previous iterations
        model.zero_grad()
        
        #take out inputs
        batch = tuple(b.to(device) for b in batch)
        
        inputs = {'input_ids':      batch[0],
                  'attention_mask': batch[1],
                  'labels':         batch[2],
                 }       
        #insert the input into the model and get the result
        outputs = model(**inputs)
        
        #calculate loss
        loss = outputs[0]
        loss_train_total += loss.item()

        #this will calculate the gradients
        loss.backward()
        # for preventening gradient explosion
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        #this will update the weights 
        optimizer.step()
        #optimizing learning rate
        scheduler.step()
        
        progress_bar.set_postfix({'training_loss': '{:.3f}'.format(loss.item()/len(batch))})
         
        
    torch.save(model.state_dict(), f'/content/Gdrive/My Drive/finetuned_longformer_epoch_{epoch}.model')
    #torch.save(model.state_dict(), f'checkpoint{epoch}.pth')
        
    tqdm.write(f'\nEpoch {epoch}')
    
    loss_train_avg = loss_train_total/len(dataloader_train)            
    tqdm.write(f'Training loss: {loss_train_avg}')
    
    val_loss, predictions, true_vals = evaluate(dataloader_validation)
    val_f1 = f1_score_func(predictions, true_vals)
    tqdm.write(f'Validation loss: {val_loss}')
    tqdm.write(f'F1 Score (Weighted): {val_f1}')

Error :

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-32-7e534d564c0a> in <module>()
     20               }       
     21      #insert the input into the model and get the result
---> 22      outputs = model(**inputs)
     23 
     24      #calculate loss

12 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in dropout(input, p, training, inplace)
    971     return (_VF.dropout_(input, p, training)
    972             if inplace
--> 973             else _VF.dropout(input, p, training))
    974 
    975 

RuntimeError: CUDA out of memory. Tried to allocate 182.00 MiB (GPU 0; 11.17 GiB total capacity; 10.23 GiB already allocated; 59.81 MiB free; 10.69 GiB reserved in total by PyTorch)

You can check my config file and model structure and custom class for Global Attention or My complete code is on colab is here :

https://colab.research.google.com/drive/19JkCht_4u6UrwcUcWNnSD2YtnsJYer0H?usp=sharing

I ran a similar code using BERT and it works without any problem.

I am new to datascience. Please help.

1

There are 1 answers

7
Pritesh Gohil On

There are few checkpoints to solve this error. Call optimizer.zero_grad() after optimizer.step(). model.zero_grad() clears old gradients from the last step but only if all your model parameter are in the same optimizer.

  1. First VIMP step is to reduce the batch size to one when dealing with CUDA memory issue.
  2. Check with SGD optimizer. According to a post in pytoch forum, Adam uses more memory than SGD.
  3. Your model is too big and consuming lot of GPU memory upon initialization. Try to reduce the size of model and check if it solves memory problem.

Edit: Longformer git repo has somewhat similar issue at https://github.com/allenai/longformer/issues/41. This might be useful if you are using a similar configuration. enter image description here

Also some idea on gradient checkpoint. enter image description here