perform peft with lora on flan-t5 model causing no executable batch size error

272 views Asked by At

I'm trying to perform PEFT with LoRA. I'm using the Google flan-T5 base model. I'm using the Python code below. I'm running the code with an nvidia GPU with 8 GB of ram on Ubuntu server 18.04 LTS. In the Python code I'm loading the public dataset from huggingface. I've loaded the pre-trained flan-T5 model. I've set up the PEFat and LoRA model.

I then add the LoRA adapter and layers to the original LLM. I define a trainer instance, but when I try to train the PEFT adapter and save the model, I get the error below that "no executable batch size found."

Can anyone see what the issue might be and can you suggest how to solve it?

Code:

# import modules
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
import torch
import time
import evaluate
import pandas as pd
import numpy as np


# load dataset and LLM 

huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)


# load pre-trained FLAN-T5 model 

model_name='google/flan-t5-base'

original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# set up peft LORA model 

from peft import LoraConfig, get_peft_model, TaskType

lora_config = LoraConfig(
    r=32, # Rank
    lora_alpha=32,
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5
)

# add LoRA adpter layers/parameters to the origianl LLM to be trained 

peft_model = get_peft_model(original_model, 
                            lora_config)
print(print_number_of_trainable_model_parameters(peft_model))


# define training arguments and create Trainer instance 

output_dir = f'./peft-dialogue-summary-training-{str(int(time.time()))}'

peft_training_args = TrainingArguments(
    output_dir=output_dir,
    auto_find_batch_size=True,
    learning_rate=1e-3, # Higher learning rate than full fine-tuning.
    num_train_epochs=1,
    logging_steps=1,
    max_steps=1    
)
    
peft_trainer = Trainer(
    model=peft_model,
    args=peft_training_args,
    train_dataset=tokenized_datasets["train"],
)

# train PEFT adapter and save the model 

peft_trainer.train()

peft_model_path="./peft-dialogue-summary-checkpoint-local"

peft_trainer.model.save_pretrained(peft_model_path)
tokenizer.save_pretrained(peft_model_path)

Error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[16], line 1
----> 1 peft_trainer.train()
      3 peft_model_path="./peft-dialogue-summary-checkpoint-local"
      5 peft_trainer.model.save_pretrained(peft_model_path)

File ~/anaconda3/envs/new_llm/lib/python3.10/site-packages/transformers/trainer.py:1664, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1659     self.model_wrapped = self.model
   1661 inner_training_loop = find_executable_batch_size(
   1662     self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size
   1663 )
-> 1664 return inner_training_loop(
   1665     args=args,
   1666     resume_from_checkpoint=resume_from_checkpoint,
   1667     trial=trial,
   1668     ignore_keys_for_eval=ignore_keys_for_eval,
   1669 )

File ~/anaconda3/envs/new_llm/lib/python3.10/site-packages/accelerate/utils/memory.py:134, in find_executable_batch_size.<locals>.decorator(*args, **kwargs)
    132 while True:
    133     if batch_size == 0:
--> 134         raise RuntimeError("No executable batch size found, reached zero.")
    135     try:
    136         return function(batch_size, *args, **kwargs)

RuntimeError: No executable batch size found, reached zero.

Update:

I restarted my kernel and error went away, not sure why. Perhaps previous model I had run was taking up too much space.

1

There are 1 answers

0
J.Sean On

Try removing the auto_find_batch_size=True in TrainingArguments and set batch size on your own