llm = AzureOpenAI(openai_api_key=OPENAI_API_KEY, deployment_name=OPENAI_DEPLOYMENT_NAME, model_name=MODEL_NAME)
# Configure the location of the PDF file.
pdfReader = PdfReader('data\borders.pdf')
# Extract the text from the PDF file.
raw_text = ''
for i, page in enumerate(pdfReader.pages):
text = page.extract_text()
if text:
raw_text += text
# Show first 1000 characters of the text.
raw_text[:1000]
# Split the text into chunks of 1000 characters with 200 characters overlap.
text_splitter = CharacterTextSplitter(
separator = "\n",
chunk_size = 1000,
chunk_overlap = 200,
length_function = len,
)
pdfTexts = text_splitter.split_text(raw_text)
# Show how many chunks of text are generated.
len(pdfTexts)
# Pass the text chunks to the Embedding Model from Azure OpenAI API to generate embeddings.
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, deployment=OPENAI_EMBEDDING_MODEL_NAME, client="azure", chunk_size=1)
# Use FAISS to index the embeddings. This will allow us to perform a similarity search on the texts using the embeddings.
# https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/faiss.html
pdfDocSearch = FAISS.from_texts(pdfTexts, embeddings)
# Create a Question Answering chain using the embeddings and the similarity search.
# https://docs.langchain.com/docs/components/chains/index_related_chains
chain = load_qa_chain(llm, chain_type="stuff")
# Perform first sample of question answering.
inquiry = "Who is the author of this book?"
docs = pdfDocSearch.similarity_search(inquiry)
chain.run(input_documents=docs, question=inquiry)
It gives this error: openai.error.InvalidRequestError: The completion operation does not work with the specified model, gpt-4. Please choose different model and try again. You can learn more about which models can be used with each operation here: https://go.microsoft.com/fwlink/?linkid=2197993.
The above error occurs when you pass the wrong model or incorrect deployment in the configuration.
According to this Document-1 and Document-2 you need
text-davinci-003
model for completion andtext-embedding-ada-002
model for embedding.When I tried with the above model the code executed and gave me output.
Code:
Output: