Before any of you mark it as a "Community Specific" or something else, just look at this question which you people so proudly have marked as Part of NLP Collective.
I know what is AutoModelForCausalLM
. The thing I'm asking is that in the peft
LoRA Fine tuning tutorial, the autors have used AutoModelForCausalLMWithValueHead
while you pick any code or notebook on Fine-tuning of any LLM with PEFT
style, you'll find AutoModelForCausalLM
being used.
I went to lean on the official documentation of AutoModelForCausalLMWithValueHead
and found:
An autoregressive model with a value head in addition to the language model head
What I want to ask is that How, where and more importantly, WHY this extra ValueHead
is used
In case you don't know the answer, you try to upvote the question rather than trying to close it, please. Thank you :)
First things first, this additional ValueHead has nothing to do with PEFT.
Mainly, PPO optimization (a RLHF technique) relies on computing "advantages" associated with taking a particular action (in this case, selecting a token) in a particular state. The computation relies on value of (state,action) pair minus, the value of being in the state. You can review the exact calculation in this function: https://github.com/huggingface/trl/blob/main/trl/trainer/ppo_trainer.py#L1148
The additional ValueHead simply projects the last hidden states onto a scalar to estimate the value of a state. Check the ValueHead class implementation here: https://github.com/huggingface/trl/blob/main/trl/models/modeling_value_head.py#L21
Note: The ValueHead class is only needed if you plan to perform training/re-training.