I have the following code that works as expected
model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf"
llm = LlamaCPP(model_url=model_url,temperature=0.1,max_new_tokens=256,context_window=3900,generate_kwargs={},model_kwargs={"n_gpu_layers": 1},messages_to_prompt=messages_to_prompt,completion_to_prompt=completion_to_prompt,verbose=True,)
reader = SimpleDirectoryReader(input_files=['test_invoice.pdf'])
documents = reader.load_data()
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
service_context = ServiceContext.from_defaults(llm=llm,embed_model=embed_model,)
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
query_engine = index.as_query_engine()
response = query_engine.query("How much is the Recomended Amount?")
print(response)
However, when I make a second all to query_engine.query, it always fails with "Empty Response". Even if it is the same question
response = query_engine.query("How much is the Recomended Amount?")
print(response)
response = query_engine.query("How much is the Recomended Amount?")
print(response)
Is there something I am missing? Is it a bug in llama-cpp? Thanks in advance.