I am implementing a pretrained transformer model using Python's transformer module to perform text summarization and I would like to compare the performance of the fine-tuned BART transformer given different number of encoders. My question is, how can I customize the number of encoders? The default transformer has 12 encoders, what if say I want to keep only the first 6 encoders? I found the following documentation for BART but I have no idea how to adapt it to my code (see below). I am new to ML and NLP so I'd be grateful if you could provide me with detailed explanation with code, Thank you!
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer, DataCollatorForSeq2Seq
tokenizer = AutoTokenizer.from_pretrained(model_checkpoints)
model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoints)
collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)
# preprocessing step omitted
# tokenized_data = preprocessed data
args = transformers.Seq2SeqTrainingArguments(
'conversation-summ',
evaluation_strategy='epoch',
learning_rate=2e-5,
per_device_train_batch_size=1,
per_device_eval_batch_size= 1,
gradient_accumulation_steps=2,
weight_decay=0.01,
save_total_limit=2,
num_train_epochs=3,
predict_with_generate=True,
eval_accumulation_steps=1,
fp16=True
)
trainer = transformers.Seq2SeqTrainer(
model,
args,
train_dataset=tokenized_data['train'],
eval_dataset=tokenized_data['validation'],
data_collator=collator,
tokenizer=tokenizer,
compute_metrics=compute_rouge
)
trainer.train()
Customizing the number of encoders in a pre-trained BART transformer model involves modifying the architecture of the model. To do so, you should first load the pre-trained model:
To modify the encoder, you need to access the encoder of the BART model and then keep only the first 6 layers:
Then, you'll need to update the configuration: