I am attempting to probe neurons for a Llama-2 7B model in order to understand which neurons are activated after a specific prompt.
from transformers import LlamaForCausalLM, LlamaTokenizer
model_name = "meta-llama/Llama-2-7b-chat-hf"
model = LlamaForCausalLM.from_pretrained(model_name)
tokenizer = LlamaTokenizer.from_pretrained(model_name)
text = "This is a test text"
encoded_input = tokenizer(text,return_tensors='pt')
outputs = model(**inputs,output_hidden_states=True)
However, I am having a hard time interpreting the hidden states, which are represented by outputs. My understanding is that Llama-2 7B has 32 transformer layers (which seem to match with the 32 items in len(outputs[1]).
My questions are:
- Which of the items in outputs refer to the neurons?
- How can I see which "neurons" are activated depending on the input prompt?
- What is the connection between the last hidden state and generated text? Is the last hidden state of outputs then fed into an output layer to determine the first word for the response?