GPT-2 taking into account output logits in forward call?

60 views Asked by At

I'm using Huggingface GPT-2 model, specifically GPT2LMHeadModel. I have 2 versions of this model, one loaded normally, and one that is the same, but I change some of the output embeddings (model.lm_head). I'm inputting a batch of sentences using the following in eval mode:

outputs = model(input_ids=test_input_ids, attention_mask=test_attention_mask)

Now from my understanding, for each input id, it only looks at the inputs to the left of it. Since I've given it multiple input ids, and this is a forward call (not generate), it should NOT be using the logits generated as input, correct?

However, when I input the same input_ids into both models, they give me different hidden states, even though they should be the same. What am I missing here?

1

There are 1 answers

0
Raj On

I found the issue. It turns out the input embedding layer and the lm_head are the same, so changing lm_head also changes the embedding layer, which is what was changing results.