Upload a pdf to chat gpt using the API?

10.1k views Asked by At

The web interface for ChatGPT has an easy pdf upload. Is there an API from openAI that can receive pdfs?

I know there are 3rd party libraries that can read pdf but given there are images and other important information in a pdf, it might be better if a model like GPT 4 Turbo was fed the actual pdf.

I'll state my use case to add more context. I intent to do RAG. Here is my pdf, here is the prompt. Normally I'd append the text at the end of the prompt. I could still do that with a pdf if I extract it myself.

Is this how I'm suppose to do it? Code from here https://platform.openai.com/docs/assistants/tools/code-interpreter

# Upload a file with an "assistants" purpose
file = client.files.create(
  file=open("example.pdf", "rb"),
  purpose='assistants'
)

# Create an assistant using the file ID
assistant = client.beta.assistants.create(
  instructions="You are a personal math tutor. When asked a math question, write and run code to answer the question.",
  model="gpt-4-1106-preview",
  tools=[{"type": "code_interpreter"}],
  file_ids=[file.id]
)

There is an upload endpoint as well, but it seems the intent of those are for fine-tuning and assistants. I think the RAG use case is a normal one and not necessarily related to assistants.

2

There are 2 answers

0
Muhammad Mubashirullah Durrani On

One solution: Convert the pdf to images and feed it to the vision model as multi image inputs https://platform.openai.com/docs/guides/vision.

GPT-4 with vision is not a different model that does worse at text tasks because it has vision, it is simply GPT-4 with vision added

Since its the same model with vision capabilities, this should be sufficient to do both text and image analysis.

You could also choose to extract images from pdf and feed those separately making a multi-model architecture. I have a preference for the first. Ideally experiments should be run to see what produces better results.

Text only + images only VS Images (containing both)

Pdf to image can be done in python locally as can separating img from pdf. It isn't a difficult task requiring support from someone like openAI.

0
fileyfood500 On

I am not aware of an OpenAI API for PDFs, however, I recommend you look at ScholarAI's GPT4 plugin, which reads academic paper PDFs and pulls text and figures. To achieve this, you can use a PDF reading library like tabula-py, and you can use ChatGPT's vision API to review the images.

I asked GPT4 as well, and it also was not aware of any API and does not believe that the GPT4 model has the ability to parse PDFs directly, and that the text must be extracted first.