Some context: I have just started using the model from Hugging Face, thebloke\llama-2-13b-chat.Q5_K_M.gguf. I am using it through llama_cpp bindings in Python and I use 1 GPU.
My goal: to retrieve pros and cons from restaurant reviews.
What I am trying to achieve at the moment: I want to test the consistency of the output by running the same question several times and evaluating the text generated. While I don't expect the same results since it's probabilistic, I expect it to be similar.
My issue: sometimes (8/31 run) the text generated seems cut. I don't change the parameters or the prompt. I would expect a similar output, but this is not the case.
This is my input: Give a precise answer to the question based on the context. Don't be verbose. Context: If you enjoy Indian food, this is a must try restaurant! Great atmosphere and welcoming service. We were at Swad with another couple and shared a few dishes. Be sure and ask for them to come at the same time and not family style as they will come one at a time. I had to try the butter chicken which was at the top of the list for the best I have ever tasted. We ordered two fabulous vegetable dishes, Aloo Gobhi Vegetable Korma, both were wonderful. Lastly we had a delightful white fish that was cooked to perfection. The service was excellent and the food amazing. I strongly recommend reservations on a Friday or Saturday night. Q: what are the pros and cons of this restaurant?\n
These are the possible results:
Pros: Great atmosphere, welcoming service, delicious Indian food, best butter chicken, wonderful vegetable dishes, delightful white fish, excellent service. Cons: None mentioned in the review.
A: Pros:
A: Based on the review, here are the pros and cons of the restaurant:
My code:
output = []
model_path = "models_gguf\\llama-2-13b-chat.Q5_K_M.gguf"
from llama_cpp import Llama
review = "If you enjoy Indian food, this is a must try restaurant! Great atmosphere and welcoming service. We were at Swad with another couple and shared a few dishes. Be sure and ask for them to come at the same time and not family style as they will come one at a time. I had to try the butter chicken which was at the top of the list for the best I have ever tasted. We ordered two fabulous vegetable dishes, Aloo Gobhi Vegetable Korma, both were wonderful. Lastly we had a delightful white fish that was cooked to perfection. The service was excellent and the food amazing. I strongly recommend reservations on a Friday or Saturday night."
sys_prompt = "Q: Give a precise answer to the question based on the context. Don't be verbose. Context: "
for test_no in range(0,25):
llm = Llama(model_path = model_path,
n_ctx=2048,
n_gpu_layers=43,
temp=0.7,
top_k= 10
)
output.append(llm(sys_prompt + review + " Question: what are the pros and cons of this restaurant?\n A: ",
max_tokens = 1000,
stop=["Q:", "\n"],
echo=True))