I am trying to make a docs question answering program with AzureOpenAI and Langchain

443 views Asked by At
llm = AzureOpenAI(openai_api_key=OPENAI_API_KEY, deployment_name=OPENAI_DEPLOYMENT_NAME, model_name=MODEL_NAME)



# Configure the location of the PDF file.
pdfReader = PdfReader('data\borders.pdf')


# Extract the text from the PDF file.
raw_text = ''
for i, page in enumerate(pdfReader.pages):
    text = page.extract_text()
    if text:
        raw_text += text

# Show first 1000 characters of the text.
raw_text[:1000]


# Split the text into chunks of 1000 characters with 200 characters overlap.
text_splitter = CharacterTextSplitter(        
    separator = "\n",
    chunk_size = 1000,
    chunk_overlap  = 200,
    length_function = len,
)
pdfTexts = text_splitter.split_text(raw_text)


# Show how many chunks of text are generated.
len(pdfTexts)

# Pass the text chunks to the Embedding Model from Azure OpenAI API to generate embeddings.
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, deployment=OPENAI_EMBEDDING_MODEL_NAME, client="azure", chunk_size=1)

# Use FAISS to index the embeddings. This will allow us to perform a similarity search on the texts using the embeddings.
# https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/faiss.html
pdfDocSearch = FAISS.from_texts(pdfTexts, embeddings)

# Create a Question Answering chain using the embeddings and the similarity search.
# https://docs.langchain.com/docs/components/chains/index_related_chains
chain = load_qa_chain(llm, chain_type="stuff")


# Perform first sample of question answering.
inquiry = "Who is the author of this book?"
docs = pdfDocSearch.similarity_search(inquiry)
chain.run(input_documents=docs, question=inquiry)

It gives this error: openai.error.InvalidRequestError: The completion operation does not work with the specified model, gpt-4. Please choose different model and try again. You can learn more about which models can be used with each operation here: https://go.microsoft.com/fwlink/?linkid=2197993.

2

There are 2 answers

0
Venkatesan On

It gives this error: openai.error.InvalidRequestError: The completion operation does not work with the specified model, gpt-4. Please choose a different model and try again. You can learn more about which models can be used with each operation here.

The above error occurs when you pass the wrong model or incorrect deployment in the configuration.

According to this Document-1 and Document-2 you need text-davinci-003 model for completion and text-embedding-ada-002 model for embedding.

When I tried with the above model the code executed and gave me output.

Code:

from langchain.llms import AzureOpenAI
from PyPDF2 import PdfReader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores.faiss import FAISS
from langchain.chains.question_answering import load_qa_chain

OPENAI_API_KEY="xxxxx"
OPENAI_DEPLOYMENT_NAME="testxxxa"    #deployment name with text-embedding-ada-002 model
deployment="textxxx"     #deployment name with text-davinci-003 model
openai_api_base1="xxxxxx"

llm = AzureOpenAI(openai_api_key=OPENAI_API_KEY, deployment_name=deployment,openai_api_base=openai_api_base1,openai_api_version="2022-12-01",openai_api_type="azure")

pdfReader = PdfReader('example.pdf')

raw_text = ''
for i, page in enumerate(pdfReader.pages):
    text = page.extract_text()
    if text:
        raw_text += text

raw_text[:1000]

text_splitter = CharacterTextSplitter(        
    separator = "\n",
    chunk_size = 1000,
    chunk_overlap  = 200,
    length_function = len,
)
pdfTexts = text_splitter.split_text(raw_text)

len(pdfTexts)

embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, deployment=OPENAI_DEPLOYMENT_NAME, openai_api_base=openai_api_base1, openai_api_type="azure", openai_api_version="2022-12-01",chunk_size=1)

pdfDocSearch = FAISS.from_texts(pdfTexts, embeddings)
chain = load_qa_chain(llm, chain_type="stuff")
inquiry = "Which month is specified?"
docs = pdfDocSearch.similarity_search(inquiry)
print(chain.run(input_documents=docs, question=inquiry))

Output:

 September

enter image description here

0
Nicolas R On

In OpenAI, you have to main operations regarding text generation:

  • completion
  • chatCompletion

Some models can be used for completion (eg: GPT3.5 version 0301, GPT-4, etc.), other can be used for chatCompletion (eg: GPT3.5 version 0613, GPT-4, etc.).

There is something that is not visible in your code which is the fact that langchain will use OpenAI with a completion operation within its step load_qa_chain.

Doc: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#model-summary-table-and-region-availability GPT4 Details

GPT 3.5 details

So in your case, you should pass a deployment which is compliant with a completion query when you set your llm:

llm = AzureOpenAI(openai_api_key=OPENAI_API_KEY, deployment_name=OPENAI_DEPLOYMENT_NAME, model_name=MODEL_NAME)