The web interface for ChatGPT has an easy pdf upload. Is there an API from openAI that can receive pdfs?
I know there are 3rd party libraries that can read pdf but given there are images and other important information in a pdf, it might be better if a model like GPT 4 Turbo was fed the actual pdf.
I'll state my use case to add more context. I intent to do RAG. Here is my pdf, here is the prompt. Normally I'd append the text at the end of the prompt. I could still do that with a pdf if I extract it myself.
Is this how I'm suppose to do it? Code from here https://platform.openai.com/docs/assistants/tools/code-interpreter
# Upload a file with an "assistants" purpose
file = client.files.create(
file=open("example.pdf", "rb"),
purpose='assistants'
)
# Create an assistant using the file ID
assistant = client.beta.assistants.create(
instructions="You are a personal math tutor. When asked a math question, write and run code to answer the question.",
model="gpt-4-1106-preview",
tools=[{"type": "code_interpreter"}],
file_ids=[file.id]
)
There is an upload endpoint as well, but it seems the intent of those are for fine-tuning and assistants. I think the RAG use case is a normal one and not necessarily related to assistants.
One solution: Convert the pdf to images and feed it to the vision model as multi image inputs https://platform.openai.com/docs/guides/vision.
Since its the same model with vision capabilities, this should be sufficient to do both text and image analysis.
You could also choose to extract images from pdf and feed those separately making a multi-model architecture. I have a preference for the first. Ideally experiments should be run to see what produces better results.
Text only + images only VS Images (containing both)
Pdf to image can be done in python locally as can separating img from pdf. It isn't a difficult task requiring support from someone like openAI.