PEFT LoRA Trainer No executable batch size found

377 views Asked by At

I'm trying to fine tune the model weights from a FLAN-T5 model downloaded from hugging face. I'm trying to do this with PEFT and specifically LoRA. I'm using the code below. I'm getting an error "No executable batch size found, reached zero". It seems to be related to the "auto_find_batch_size" parameter that gets passed to the peft_trainer. I'm running this on ubuntu server 18.04LTS with an invidia gpu that has 8GB of ram. Can anyone see what the issue might be and suggest how to solve it?

code:

from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
import torch
import time
import evaluate
import pandas as pd
import numpy as np


#
# ### Load Dataset and LLM


huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)

dataset


# Load the pre-trained [FLAN-T5 model](https://huggingface.co/docs/transformers/model_doc/flan-t5) and its tokenizer directly from HuggingFace. Using the [small version](https://huggingface.co/google/flan-t5-base) of FLAN-T5. Setting `torch_dtype=torch.bfloat16` specifies the memory type to be used by this model.


model_name='google/flan-t5-base'

original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)



index = 200

dialogue = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']

prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
"""

inputs = tokenizer(prompt, return_tensors='pt')
output = tokenizer.decode(
    original_model.generate(
        inputs["input_ids"],
        max_new_tokens=200,
    )[0],
    skip_special_tokens=True
)

dash_line = '-'.join('' for x in range(100))


# updated 11/1/23 to ensure using gpu
def tokenize_function(example):
    start_prompt = 'Summarize the following conversation.\n\n'
    end_prompt = '\n\nSummary: '
    prompt = [start_prompt + dialogue + end_prompt for dialogue in example["dialogue"]]
    example['input_ids'] = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids\
    .cuda()
    example['labels'] = tokenizer(example["summary"], padding="max_length", truncation=True, return_tensors="pt").input_ids\
    .cuda()

    return example

# The dataset actually contains 3 diff splits: train, validation, test.
# The tokenize_function code is handling all data across all splits in batches.
tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(['id', 'topic', 'dialogue', 'summary',])


# To save some time subsample the dataset:

tokenized_datasets = tokenized_datasets.filter(lambda example, index: index % 100 == 0, with_indices=True)




from peft import LoraConfig, get_peft_model, TaskType

lora_config = LoraConfig(
    r=32, # Rank
    lora_alpha=32,
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5
)


# Add LoRA adapter layers/parameters to the original LLM to be trained.



peft_model = get_peft_model(original_model,
                            lora_config)
# print(print_number_of_trainable_model_parameters(peft_model))


#
# ### Train PEFT Adapter
#
# Define training arguments and create `Trainer` instance.

# In[10]:


output_dir = f'/path/LLM/PEFT/train_args/no_log_max_depth_{str(int(time.time()))}'

peft_training_args = TrainingArguments(
    output_dir=output_dir,
    auto_find_batch_size=True,
    learning_rate=1e-3, # Higher learning rate than full fine-tuning.
    num_train_epochs=1
)

peft_trainer = Trainer(
    model=peft_model,
    args=peft_training_args,
    train_dataset=tokenized_datasets["train"],
)


# In[11]:


peft_trainer.train()

peft_model_path="/path/LLM/PEFT/peft-dialogue-summary-checkpoint-local"

peft_trainer.model.save_pretrained(peft_model_path)
tokenizer.save_pretrained(peft_model_path)

error:

Found cached dataset csv (/home/username/.cache/huggingface/datasets/knkarthick___csv/knkarthick--dialogsum-cd36827d3490488d/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1)
100%|██████████| 3/3 [00:00<00:00, 1134.31it/s]
/home/username/anaconda3/envs/new_llm/lib/python3.10/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
  0%|          | 0/16 [00:00<?, ?it/s]
  0%|          | 0/32 [00:00<?, ?it/s]
  0%|          | 0/63 [00:00<?, ?it/s]
Traceback (most recent call last):it/s]
  File "/home/username/stuff/username_storage/LLM/PEFT/offline_peft_train_no_log_max_depth.py", line 161, in <module>
    peft_trainer.train()
  File "/home/username/anaconda3/envs/new_llm/lib/python3.10/site-packages/transformers/trainer.py", line 1664, in train
    return inner_training_loop(
  File "/home/username/anaconda3/envs/new_llm/lib/python3.10/site-packages/accelerate/utils/memory.py", line 134, in decorator
    raise RuntimeError("No executable batch size found, reached zero.")
RuntimeError: No executable batch size found, reached zero.
  0%|          | 0/125 [00:00<?, ?it/s]
1

There are 1 answers

0
Shafiq Jetha On

It could be that auto_find_batch_size is not perfect in its process. It might have a value that would be too big to fit into the currently-available VRAM space, and so the training loop decides that it can't continue and errors out. I'm seeing this myself and that's the conclusion I've come to.

It might be better to pin the batch size and see if the error is resolved. You would have to manually fiddle with the batch size to utilise the available VRAM as effectively as possible.