I am working in project where I have to use multiple pdf docs to give response to the user query.
I have a load method to load pdf from directory.
def loadFiles():
loader = DirectoryLoader('./static/upload/', glob="./*.pdf", loader_cls=PyPDFLoader)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
texts = text_splitter.split_documents(documents)
return texts
I am creating chromadb by below code:
def createDb(load):
embeddings = OpenAIEmbeddings()
persist_directory = './ChromaDb'
vectordb = Chroma.from_documents(documents=load, embedding=embeddings, persist_directory=persist_directory)
vectordb.persist()
return vectordb
Now, I am querying chromadb:
qa_chain = RetrievalQA.from_chain_type(
llm=OpenAI(temperature=0,model_name = "text-davinci-003"),
retriever=vectordb.as_retriever(),chain_type="stuff",
chain_type_kwargs=chain_type_kwargs,
return_source_documents=True
)
However, I am getting the response, but not full response in some cases, as shown here:
My source pdf has following contents:
While my response is showing only some parts as shown below:
I tried increasing the chunk_overlap
size as shown in createdb()
, but it does not work. I am expecting full response from chromadb and response should be coming from given pdf.
I am new to this, I will be thankful for any help.
I rewrite the code based on my understanding what you want. Please give a try