Implementation (and working) differences between AutoModelForCausalLMWithValueHead vs AutoModelForCausalLM?

308 views Asked by At

Before any of you mark it as a "Community Specific" or something else, just look at this question which you people so proudly have marked as Part of NLP Collective.

I know what is AutoModelForCausalLM. The thing I'm asking is that in the peft LoRA Fine tuning tutorial, the autors have used AutoModelForCausalLMWithValueHead while you pick any code or notebook on Fine-tuning of any LLM with PEFT style, you'll find AutoModelForCausalLM being used.

I went to lean on the official documentation of AutoModelForCausalLMWithValueHead and found:

An autoregressive model with a value head in addition to the language model head

What I want to ask is that How, where and more importantly, WHY this extra ValueHead is used

In case you don't know the answer, you try to upvote the question rather than trying to close it, please. Thank you :)

1

There are 1 answers

0
Shahzeb Naveed On

First things first, this additional ValueHead has nothing to do with PEFT.

Mainly, PPO optimization (a RLHF technique) relies on computing "advantages" associated with taking a particular action (in this case, selecting a token) in a particular state. The computation relies on value of (state,action) pair minus, the value of being in the state. You can review the exact calculation in this function: https://github.com/huggingface/trl/blob/main/trl/trainer/ppo_trainer.py#L1148

The additional ValueHead simply projects the last hidden states onto a scalar to estimate the value of a state. Check the ValueHead class implementation here: https://github.com/huggingface/trl/blob/main/trl/models/modeling_value_head.py#L21

Note: The ValueHead class is only needed if you plan to perform training/re-training.