ValidationError: 1 validation error for RetrievalQA retriever value is not a valid dict (type=type_error.dict)

1.3k views Asked by At

I'm working on a vector store qa bot to store docs from a csv file using langchain+chroma to create a vector store. I am using PALM model for my project to answer from the vector store. here's my code


import csv
from langchain.docstore.document import Document
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
import pandas as pd

hf_embed = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

with open("COHORT.csv", newline="") as csvfile:
    reader = csv.reader(csvfile)
    i = 0

    # Iterate through the rows in the CSV file and print each row
    for row in reader:
        s1 = f"The person is having Condition {row[0]}, Condition start date is {row[13]} with year of birth {row[14]}, the person's ethnicity is {row[18]} and race is {row[19]}, the person is a {row[20]} and uses the drug {row[21]} for the treatment"
        print(s1)
        collection_name = f"chatbot2_batch{i}"
        print(collection_name)

        # Create Chroma vector store from the batch
        Vector_db = Chroma.from_texts(
            collection_name=collection_name, texts=s1, embedding=hf_embed, persist_directory="kai3"
        )
        Vector_db.persist()

        pdf_vector_db_path = "kai3"
        db = Chroma(
            collection_name="chatbot2",
            embedding_function=hf_embed,
            persist_directory=pdf_vector_db_path,
        )

        Vector_db.persist()
        i += 1

METHOD 1

llm = GooglePalm(temperature=0.1, key="XXXXXX")

# Get the retriever from the Chroma vector store
retriever = db.as_retriever()

# Create a RetrievalQA chain
qa_chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, input_key="question")

# Retrieve the answer from the vector store
answer = qa_chain("WHAT's the most used drug?")

# Print the answer
print(answer)

When I tried the above method I'm getting answer from the pretrained memory of PALM model and not from the vector store.

METHOD 2


# Create a RetrievalQA chain directly with the Chroma vector store
qa_chain1 = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=db, input_key="question"
)
# Retrieve the answer from the vector store
answer = qa_chain1("WHAT's the most used drug?")

# Print the answer
print(answer)

When I try this I'm getting an error stating


---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
<ipython-input-33-6d935d969703> in <cell line: 1>()
----> 1 qa_chain1 = RetrievalQA.from_chain_type(llm=llm,
      2                             chain_type="stuff",
      3                             retriever=db,
      4                             input_key="question",
      5                              )

2 frames
/usr/local/lib/python3.10/dist-packages/pydantic/main.cpython-310-x86_64-linux-gnu.so in pydantic.main.BaseModel.__init__()

ValidationError: 1 validation error for RetrievalQA
retriever
  value is not a valid dict (type=type_error.dict)

1

There are 1 answers

0
Mark McDonald On

You have a few problems in your code.

answer = qa_chain("WHAT's the most used drug?")

This is not a question that can be answered through retrieval. This is a question that requires aggregation of the entire dataset.

Ignoring that for a second, and using another question, like What drug is used to treat face cancer (or some other condition in your dataset), your code is (inside of a loop!) creating a DB from a single text, in Vector_db, but then you are using db for retrieval, which is empty, so the LLM is generating an answer for you from its internal knowledge.

Consider this:

# Build the list of documents.
texts = []
for i, row in enumerate(reader):
    s1 = f"The person is having Condition ..."
    print(s1)
    texts.append(s1)

# Create Chroma vector store from *all* of the documents
Vector_db = Chroma.from_texts(
    collection_name="chatbot2", texts=[texts], embedding=hf_embed, persist_directory="kai3"
)
Vector_db.persist()
>>> # Retrieve the answer from the vector store
>>> answer = qa_chain("What is used to treat face cancer?")
>>> # Print the answer
>>> print(answer)
{'question': 'What is used to treat face cancer?', 'result': 'Arsenic Trioxide'}