llama2 13B quantized model gives inconsistent results

215 views Asked by At

Some context: I have just started using the model from Hugging Face, thebloke\llama-2-13b-chat.Q5_K_M.gguf. I am using it through llama_cpp bindings in Python and I use 1 GPU.

My goal: to retrieve pros and cons from restaurant reviews.

What I am trying to achieve at the moment: I want to test the consistency of the output by running the same question several times and evaluating the text generated. While I don't expect the same results since it's probabilistic, I expect it to be similar.

My issue: sometimes (8/31 run) the text generated seems cut. I don't change the parameters or the prompt. I would expect a similar output, but this is not the case.

This is my input: Give a precise answer to the question based on the context. Don't be verbose. Context: If you enjoy Indian food, this is a must try restaurant! Great atmosphere and welcoming service. We were at Swad with another couple and shared a few dishes. Be sure and ask for them to come at the same time and not family style as they will come one at a time. I had to try the butter chicken which was at the top of the list for the best I have ever tasted. We ordered two fabulous vegetable dishes, Aloo Gobhi Vegetable Korma, both were wonderful. Lastly we had a delightful white fish that was cooked to perfection. The service was excellent and the food amazing. I strongly recommend reservations on a Friday or Saturday night. Q: what are the pros and cons of this restaurant?\n

These are the possible results:

  • Pros: Great atmosphere, welcoming service, delicious Indian food, best butter chicken, wonderful vegetable dishes, delightful white fish, excellent service. Cons: None mentioned in the review.

  • A: Pros:

  • A: Based on the review, here are the pros and cons of the restaurant:

My code:

output = []
model_path = "models_gguf\\llama-2-13b-chat.Q5_K_M.gguf"
from llama_cpp import Llama
 
review = "If you enjoy Indian food, this is a must try restaurant! Great atmosphere and welcoming service. We were at Swad with another couple and shared a few dishes. Be sure and ask for them to come at the same time and not family style as they will come one at a time. I had to try the butter chicken which was at the top of the list for the best I have ever tasted. We ordered two fabulous vegetable dishes, Aloo Gobhi Vegetable Korma, both were wonderful. Lastly we had a delightful white fish that was cooked to perfection. The service was excellent and the food amazing. I strongly recommend reservations on a Friday or Saturday night."
sys_prompt = "Q: Give a precise answer to the question based on the context. Don't be verbose. Context: "
 
for test_no in range(0,25):
    llm = Llama(model_path = model_path, 
            n_ctx=2048, 
            n_gpu_layers=43, 
            temp=0.7,  
            top_k= 10
            )
    output.append(llm(sys_prompt + review + " Question: what are the pros and cons of this restaurant?\n A: ", 
                 max_tokens = 1000,
                 stop=["Q:", "\n"],
                 echo=True))
0

There are 0 answers