Enhancing ChatGPT's Question-Answering Capabilities with PDF Dataset and Azure Search: Best Practices?

119 views Asked by At

I'm aiming to enable ChatGPT to answer questions by leveraging a large dataset of PDFs through Azure Search, specifically focusing on vector search. My current approach involves extracting text and tables from the PDFs using Azure Form Recognizer (now known as Azure Document AI) and organizing the data into a pandas dataframe.

I'm currently in the process of setting up Azure Vector Search to index this data and provide ChatGPT with context. However, I've encountered challenges with data quality degradation after extracting information from the PDFs. I'm seeking advice on how to optimize this procedure for the best results. Also wondering if there is a possible method of circumnavigating the entire process of manual extraction and answering directly on the pdf's.

0

There are 0 answers