How to use HuggingFace embbedings with Azure Cognitive Search

Question

How to use HuggingFace embbedings with Azure Cognitive Search

549 views Asked by Pablo At 07 November 2023 at 14:21

I'm using langchain with Azure OpenAI and Azure Cognitive Search. Currently I'm using Azure OpenAI text-embedding-ada-002 model for generating embeddings, but I would like to use a embbeding model from HugginFace if possible, because Azure OpenAI API does not allow to send documents in batches, so I need to make several calls and hit the rate limit.

I tried using this embbeding in my code:

embeddings = SentenceTransformerEmbeddings(
        model_name="all-mpnet-base-v2",
    )

Instead of:

embeddings = OpenAIEmbeddings(
        ...
)

The problem I'm facing, is that when I use AzureSearch's aadd_texts method I get this error:

The vector field 'content_vector' dimensionality must match the field definition's 'dimensions' property. Expected: '1536'. Actual: '768'. (IndexDocumentsFieldError) 98: The vector field 'content_vector' dimensionality must match the field definition's 'dimensions' property. Expected: '1536'. Actual: '768'.
        Code: IndexDocumentsFieldError

I'm pretty lost. Did anyone used an open source embeddings model with Cognitive Search? How?

Original Q&A

There are 2 answers

Gaurav Mantri On 07 November 2023 at 14:52

As the error message indicates, there's a mismatch in the dimension of the vector field.

Based on the documentation for all-mpnet-base-v2 model, the dimension of the vector is 768 (ref: https://huggingface.co/sentence-transformers/all-mpnet-base-v2#all-mpnet-base-v2) whereas the default dimension for a vector field created in Azure Search is 1536 (it assumes that you are going to use text-embedding-ada-002 model from Open AI).

Please check your index definition and make sure that the dimension property of the vector fields is 768. You may need to recreate your index.

**jtsun** · Accepted Answer · 2023-11-09T05:23:53+00:00

Azure OpenAI API does allow to send documents in batches.

Input string or array Input text to get embeddings for, encoded as an array or string. The number of input tokens varies depending on what model you are using. Only text-embedding-ada-002 (Version 2) supports array input.

Please check https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#embeddings.

TechQA.

How to use HuggingFace embbedings with Azure Cognitive Search

There are 2 answers

Related Questions in PYTHON

Related Questions in AZURE-COGNITIVE-SEARCH

Related Questions in LANGCHAIN

Related Questions in PY-LANGCHAIN

Related Questions in OPENAIEMBEDDINGS

Popular Questions

Popular Tags

Trending Questions