In Google Colab I have loaded a BERT model using the Hugging Face transformers library and then finetuned it using Seq2SeqTrainer. I then saved this model to my Google Drive using model.save_pretrained("folder/path"). However, when I load this model in another Google Colab notebook using EncoderDecoder.from_pretrained(), I get this message:
The following encoder weights were not tied to the decoder ['bert/pooler']
The following encoder weights were not tied to the decoder ['bert/pooler']
The following encoder weights were not tied to the decoder ['bert/pooler']
The following encoder weights were not tied to the decoder ['bert/pooler']
Now, here's where it gets weird: My model then seems to work for the first time it's run (with some differences from when it's run in the same Colab notebook as it was finetuned on). But then I take the output I got and put it back into the model and I get the exact same output. As in, I input "apple" the first time, I get "banana". Then I input "banana" into the model and get "banana" again! Is this normal, or is this because the pooler weights haven't been properly set?
Here is my minimal code sample:
model = EncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-uncased", "bert-base-uncased", tie_encoder_decoder=True)
model.config.max_length = 512
model.config.min_length = 10
model.config.no_repeat_ngram_size = 0
model.config.early_stopping = True
model.config.length_penalty = 2.0
model.config.num_beams = 4
training_args = Seq2SeqTrainingArguments(
predict_with_generate=True,
fp16=True,
output_dir="./",
logging_steps=2,
save_steps=10
)
trainer = Seq2SeqTrainer(
model=model,
tokenizer=tokenizer,
args=training_args,
train_dataset=train
)
model.save_pretrained("/folder/path")
So I figured out that the reason I got the "The following encoder weights were not tied to the decoder ['bert/pooler']" was because I had "tie_encoder_decoder=True" as an option when warm-starting the model before finetuning and then saving it. When I removed that the message went away.
I'm still having trouble with the output being basically the same after the second time the model is run, but there are a few differences now, so that's better. If someone could ever explain to me why I get the same output the second time the model is run, that would be nice, but it's a bit better now than it was.