Is there a way to Use langchain FAISS without an AI?

1.2k views Asked by At

I'm working on an AI project but my current problem right now is that FAISS is taking far too long to load the documents. So Iv moved it into its own service via fastapi.

Everything Looks ok, but when I run it I get the error of:

id not find openai_api_key, please add an environment variable `OPENAI_API_KEY`

In my code:

embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(documents, embeddings)

Now I am Using OpenAI but not in this service so i did not add my key.

From my understanding its just taking text tokenizing it using openAI's token map, and then doing a search and finding the nearest related documents based on that query.

That, Technically does not actually reach out to Open AI servers does it?

Afterwords i'm just adding the related documents to the prompt that I Send to Open AI's servers, So if its sending data to open AI twice that a tad inefficient right?

How can I get this to just be its own service? Or am I wasting my time here?

2

There are 2 answers

0
Nick ODell On BEST ANSWER

Now I am Using OpenAI but not in this service so i did not add my key.

Calling FAISS.from_documents(documents, embeddings) embeds the documents. Embedding documents using the OpenAIEmbeddings requires an API call to OpenAI for each document.

Per the documentation:

To use, you should have the openai python package installed, and the environment variable OPENAI_API_KEY set with your API key or pass it as a named parameter to the constructor.

https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.openai.OpenAIEmbeddings.html

Afterwords i'm just adding the related documents to the prompt that I Send to Open AI's servers, So if its sending data to open AI twice that a tad inefficient right?

Maybe, but

  1. It's a trivial amount of text data, and
  2. OpenAI doesn't have a vector search product, so any approach that uses both OpenAI embeddings and OpenAI LLMs will require two requests.

Is there a way to Use langchain FAISS without an AI?

There are a few approaches you could take:

  1. Run a local model. This is not "without AI," but I'm guessing you really mean "without OpenAI." There are various language models that can be used to embed a sentence/paragraph into a vector. Here's an example.
  2. Bite the bullet, and use OpenAI or some other API for getting the embeddings. Langchain has a list of supported embeddings here.
  3. Use something like SciKit Learn's TfIdfVectorizer. This is not AI - it's an approach where each keyword in your input is mapped to one element of the output. This is no longer semantic search, but keyword search. For example, "street" and "road" would vectorize to totally different things. That might be good enough for your application, though.
0
Sabrina On

You can just overwrite the "OpenAIEmbeddings" class with your own model and your own API.

Here is an example using "BAAI/bge-m3" as an embedder. Just replace EMBEDDING_API_PATH with the uri to the embedder API you created. Make sure that your API follows the standard open ai request format. Strangely you need to set an api key even if your own api doesnt request one. So just do like here "123" if thats fine in your case from a sec perspective.

embeddings = OpenAIEmbeddings(
    model="BAAI/bge-m3",
    openai_api_base=EMBEDDING_API_PATH,
    openai_api_key="123",
    tiktoken_enabled=False,
    show_progress_bar=True,
)

And then you can use it just like before:

db = FAISS.from_documents(documents, embeddings)