Need help in debugging llma-index mistral 7b document chatbot

163 views Asked by Manojna N At 16 January 2024 at 10:05

Till last week the code given below was working fine. Im running this code on google colab using T4 gpu runtime. But starting today the response to any question I ask is just '##########'. I need some help identifying the problem.

This is the code for the chatbot:

!pip install -q pypdf
!pip install -q python-dotenv
!pip install -q transformers
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install  llama-cpp-python --no-cache-dir
!pip install -q llama-index
!pip -q install sentence-transformers
!pip install langchain
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
documents = SimpleDirectoryReader("/content/Data/").load_data()
import torch
from llama_index.llms import LlamaCPP
from llama_index.llms.llama_utils import messages_to_prompt, completion_to_prompt
llm = LlamaCPP(
    # You can pass in the URL to a GGML model to download it automatically
    model_url='https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_M.gguf',
    # optionally, you can set the path to a pre-downloaded model instead of model_url
    model_path=None,
    temperature=0.1,
    max_new_tokens=256,
    # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
    context_window=3900,
    # kwargs to pass to __call__()
    generate_kwargs={},
    # kwargs to pass to __init__()
    # set to at least 1 to use GPU
    model_kwargs={"n_gpu_layers": -1},
    # transform inputs into Llama2 format
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=True,
)
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index.embeddings import LangchainEmbedding
from llama_index import ServiceContext
embed_model = LangchainEmbedding(
  HuggingFaceEmbeddings(model_name="thenlper/gte-large")
)
service_context = ServiceContext.from_defaults(
    chunk_size=256,
    llm=llm,
    embed_model=embed_model
)
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
query_engine = index.as_query_engine()
response = query_engine.query("Enter your query here")
print(response)

Tried everything and not getting the response other than a huge string of '#'.

Original Q&A

TechQA.

Need help in debugging llma-index mistral 7b document chatbot

There are 0 answers

Related Questions in LLAMA-INDEX

Related Questions in LLAMA-CPP-PYTHON

Related Questions in MISTRAL-7B

Popular Questions

Popular Tags

Trending Questions