I'm trying to perform PEFT with LoRA. I'm using the Google flan-T5 base model. I'm using the Python code below. I'm running the code with an nvidia GPU with 8 GB of ram on Ubuntu server 18.04 LTS. In the Python code I'm loading the public dataset from huggingface. I've loaded the pre-trained flan-T5 model. I've set up the PEFat and LoRA model.
I then add the LoRA adapter and layers to the original LLM. I define a trainer instance, but when I try to train the PEFT adapter and save the model, I get the error below that "no executable batch size found."
Can anyone see what the issue might be and can you suggest how to solve it?
Code:
# import modules
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
import torch
import time
import evaluate
import pandas as pd
import numpy as np
# load dataset and LLM
huggingface_dataset_name = "knkarthick/dialogsum"
dataset = load_dataset(huggingface_dataset_name)
# load pre-trained FLAN-T5 model
model_name='google/flan-t5-base'
original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# set up peft LORA model
from peft import LoraConfig, get_peft_model, TaskType
lora_config = LoraConfig(
r=32, # Rank
lora_alpha=32,
target_modules=["q", "v"],
lora_dropout=0.05,
bias="none",
task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5
)
# add LoRA adpter layers/parameters to the origianl LLM to be trained
peft_model = get_peft_model(original_model,
lora_config)
print(print_number_of_trainable_model_parameters(peft_model))
# define training arguments and create Trainer instance
output_dir = f'./peft-dialogue-summary-training-{str(int(time.time()))}'
peft_training_args = TrainingArguments(
output_dir=output_dir,
auto_find_batch_size=True,
learning_rate=1e-3, # Higher learning rate than full fine-tuning.
num_train_epochs=1,
logging_steps=1,
max_steps=1
)
peft_trainer = Trainer(
model=peft_model,
args=peft_training_args,
train_dataset=tokenized_datasets["train"],
)
# train PEFT adapter and save the model
peft_trainer.train()
peft_model_path="./peft-dialogue-summary-checkpoint-local"
peft_trainer.model.save_pretrained(peft_model_path)
tokenizer.save_pretrained(peft_model_path)
Error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[16], line 1
----> 1 peft_trainer.train()
3 peft_model_path="./peft-dialogue-summary-checkpoint-local"
5 peft_trainer.model.save_pretrained(peft_model_path)
File ~/anaconda3/envs/new_llm/lib/python3.10/site-packages/transformers/trainer.py:1664, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1659 self.model_wrapped = self.model
1661 inner_training_loop = find_executable_batch_size(
1662 self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size
1663 )
-> 1664 return inner_training_loop(
1665 args=args,
1666 resume_from_checkpoint=resume_from_checkpoint,
1667 trial=trial,
1668 ignore_keys_for_eval=ignore_keys_for_eval,
1669 )
File ~/anaconda3/envs/new_llm/lib/python3.10/site-packages/accelerate/utils/memory.py:134, in find_executable_batch_size.<locals>.decorator(*args, **kwargs)
132 while True:
133 if batch_size == 0:
--> 134 raise RuntimeError("No executable batch size found, reached zero.")
135 try:
136 return function(batch_size, *args, **kwargs)
RuntimeError: No executable batch size found, reached zero.
Update:
I restarted my kernel and error went away, not sure why. Perhaps previous model I had run was taking up too much space.
Try removing the
auto_find_batch_size=True
inTrainingArguments
and set batch size on your own