Extract PDF Content Including Images For RAG

152 views Asked by At

I am trying to build a PDF content extraction and chunking system for RAG in my application. I need to include images from pdf as urls,so that the llm can use that images in the response most of the solutions that i have seen only extract text content from pdf.Is there any way to extract images and text from pdf ?

1

There are 1 answers

0
Nick Magnanini - preprocess.co On

PyMuPDF allows you to do that for images and tables