Langchain or Chroma Vector Store cannot find correct matches

276 views Asked by At

I am using Langchain + Chroma + OpenAI to do a Q&A program with a csv document as its knowledge base.

The CSV file looks like below: enter image description here

Here is the CSV file: https://1drv.ms/u/s!Asflam6BEzhjgbkdegCGfZ7FI4O1Og?e=2X6ior

And code for creating embedding:

from langchain.document_loaders.csv_loader import CSVLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores.chroma import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter as RCTS

file_path = "Test.csv"
doc_pages = []

csv_loader = CSVLoader(file_path)
doc_pages = csv_loader.load()
print(f"Extracted {file_path} with {len(doc_pages)} pages...")

splitter = RCTS(chunk_size = 3000, chunk_overlap = 300)
splitted_docs = splitter.split_documents(doc_pages)

embedding = OpenAIEmbeddings()
persist_directory = "docs_t/chroma/"

vectordb = Chroma.from_documents(
    documents=splitted_docs,
    embedding=embedding,
    persist_directory=persist_directory
)

vectordb.persist()

print(vectordb._collection.count())

Here is the Testing code:

result = vectordb.similarity_search("what is the Support Item Name for 01_003_0107_1_1", k=3)
for r in result:
    print(r.page_content, end="\n\n")

And I see this testing code returns all other non-relevant information.

Which part leads to this issue?

0

There are 0 answers