How to increase the response size of chromadb

452 views Asked by At

I am working in project where I have to use multiple pdf docs to give response to the user query.

I have a load method to load pdf from directory.

def loadFiles():
    
    loader = DirectoryLoader('./static/upload/', glob="./*.pdf", loader_cls=PyPDFLoader)
    documents = loader.load()

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
    texts = text_splitter.split_documents(documents)
    return texts

I am creating chromadb by below code:

def createDb(load):
    embeddings = OpenAIEmbeddings()
    persist_directory = './ChromaDb'

    vectordb = Chroma.from_documents(documents=load, embedding=embeddings, persist_directory=persist_directory)
    vectordb.persist()
    return vectordb

Now, I am querying chromadb:

qa_chain = RetrievalQA.from_chain_type(
        llm=OpenAI(temperature=0,model_name = "text-davinci-003"),
        retriever=vectordb.as_retriever(),chain_type="stuff",
        chain_type_kwargs=chain_type_kwargs,
       
        return_source_documents=True
        )

However, I am getting the response, but not full response in some cases, as shown here:

My source pdf has following contents:

source file

While my response is showing only some parts as shown below:

chromadb response

I tried increasing the chunk_overlap size as shown in createdb(), but it does not work. I am expecting full response from chromadb and response should be coming from given pdf.

I am new to this, I will be thankful for any help.

1

There are 1 answers

5
j3ffyang On

I rewrite the code based on my understanding what you want. Please give a try

from langchain.document_loaders import TextLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI


loader = DirectoryLoader('/tmp/', glob="./*.pdf")
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,
                                               chunk_overlap=100)
texts = text_splitter.split_documents(documents)

embeddings = HuggingFaceEmbeddings(
    model_name="bert-base-multilingual-cased")

persist_directory = "/tmp/chromadb"
vectordb = Chroma.from_documents(documents=texts, embedding=embeddings, persist_directory=persist_directory)
vectordb.persist()


qa_chain = RetrievalQA.from_chain_type(
        llm=OpenAI(temperature=0, model_name="text-davinci-003"),
        retriever=vectordb.as_retriever(), chain_type="stuff",
        # chain_type_kwargs=chain_type_kwargs,
        return_source_documents=True)

response = qa_chain("please summarize this book")
print(response)