I have thousands of PPTX files and I want to create a chatbot which queries the PPTX file data. Since these PPTX files are large, I decided to use the following approach:
Read all PPTX files and generate summary of each PPTX file. Store the summary of each PPTX file in vector database along with source document metadata. Query the vector database on the basis of user query Pass the query and returned documents to LLM to get the final output. Return the final output and the source document(s) to the user.
I am using UnstructuredPowerPointLoader to load the PPTX files and create a summary of each file using load_summarize_chain. The chain returns me string.
How can I store the output of load_summarize_chain in vector database (chromadb) alongwith metadata.
Also please let me know if this approach is correct. Any sample code example will be really helpful.