Error with Mistral 7B model in ConversationalRetrievalChain

1.6k views Asked by At

I'm encountering an issue while using the Mistral 7B model in a ConversationalRetrievalChain. When I input a question, such as "What is the highest GDP?", I receive an error and after that the model generates a random response as output which is not relevant to the Input query. It seems that the number of tokens in the input exceeds the maximum context length. Here's the relevant code:

from langchain.document_loaders.csv_loader import CSVLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.llms import CTransformers
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
import sys

DB_FAISS_PATH = "vectorstore/db_faiss"
loader = CSVLoader(file_path="data/World Happiness Report 2022.csv", encoding="utf-8", csv_args={'delimiter': ','})
data = loader.load()
print(data)

# Split the text into Chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
text_chunks = text_splitter.split_documents(data)

print(len(text_chunks))

# Download Sentence Transformers Embedding From Hugging Face
embeddings = HuggingFaceEmbeddings(model_name = 'sentence-transformers/all-MiniLM-L6-v2')

# Converting the text Chunks into embeddings and saving the embeddings into FAISS Knowledge Base
docsearch = FAISS.from_documents(text_chunks, embeddings)

docsearch.save_local(DB_FAISS_PATH)


#query = "What is the value of GDP per capita of Finland provided in the data?"

#docs = docsearch.similarity_search(query, k=3)

#print("Result", docs)

llm = CTransformers(model="models/mistral-7b-v0.1.Q4_0.gguf",
                    model_type="llama",
                    max_new_tokens=1000,
                    temperature=0.1)

qa = ConversationalRetrievalChain.from_llm(llm, retriever=docsearch.as_retriever())

while True:
    chat_history = []
    #query = "What is the value of  GDP per capita of Finland provided in the data?"
    query = input(f"Input Prompt: ")
    if query == 'exit':
        print('Exiting')
        sys.exit()`
    if query == '':
        continue
    result = qa({"question":query, "chat_history":chat_history})
    print("Response: ", result['answer'])`

Problem Statement:

I'm trying to utilize the Mistral 7B model for a ConversationalRetrievalChain, but I'm encountering an error related to token length:

Number of tokens (760) exceeded maximum context length (512).

Context:

I'm working on a project that involves using Mistral 7B to answer questions based on a dataset. The dataset contains information about the World Happiness Report 2022.

Steps Taken:

  • Loaded and preprocessed the dataset using langchain.
  • Initialized Mistral 7B with the following parameters:
  • Model: 'models/mistral-7b-v0.1.Q4_0.gguf'
  • Model Type: 'llama'
  • Max New Tokens: 1000
  • Temperature: 0.1
  • Set up a ConversationalRetrievalChain with Mistral 7B as the language model and a retriever based on FAISS.

Expected Output:

I expect to receive a meaningful response from Mistral 7B based on the input query.

Additional Information:

I'm using Python and relevant libraries for this project. The dataset I'm working with is from the World Happiness Report 2022.

Environment Details:

  • Python version: 3.11.4
  • Relevant libraries and versions:

langchain, ctransformers, sentence-transformers, faiss-cpu

1

There are 1 answers

0
Aniket Adhav On
llm = CTransformers(
    model="TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
    model_file="mistral-7b-instruct-v0.1.Q4_K_M.gguf",
    config={"max_new_tokens": 2048, "context_length": 4096, "temperature": 0},
)