Azure search indexer and skillset | how to get rid of warning when indexing deleted blobs

46 views Asked by At

In the Azure portal, I created a RAG environment with a search instance, a skillset and an indexer by using the Import and Vectorize data option.

Adding document works like a charm.

When I delete (soft delete option enabled) a document from a container and start the indexer, the blobs are deleted from the container and the related documents (result of a splitting and embedding skillset) are removed from the search service

The indexer console shows warnings like these:

operation

Projection.IndexProjections.SearchIndex.<indexname>

Message 

Could not generate projection from input '/document/pages/*'. Check the 'source' or 'sourceContext' property of your projection in your skillset. =$(/document/pages/*) ?map { "chunk": $(/document/pages/*), "doc_name": $(/document/metadata_storage_name), "source": $(/document/owner), "title": $(/document/title), "vector": $(/document/pages/*/vector) }

The skillset in play


{
  "@odata.context": "https://<myservice>.search.windows.net/$metadata#skillsets/$entity",
  "@odata.etag": "\"**********\"",
  "name": "<my>-skillset",
  "description": "Skillset to chunk documents and generate embeddings",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
      "name": "#1",
      "description": null,
      "context": "/document/pages/*",
      "resourceUri": "https://<myresource>.openai.azure.com",
      "apiKey": "<redacted>",
      "deploymentId": "text-embedding-ada-002",
      "inputs": [
        {
          "name": "text",
          "source": "/document/pages/*"
        }
      ],
      "outputs": [
        {
          "name": "embedding",
          "targetName": "vector"
        }
      ],
      "authIdentity": null
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
      "name": "#2",
      "description": "Split skill to chunk documents",
      "context": "/document",
      "defaultLanguageCode": "en",
      "textSplitMode": "pages",
      "maximumPageLength": 5000,
      "pageOverlapLength": 1250,
      "maximumPagesToTake": 0,
      "inputs": [
        {
          "name": "text",
          "source": "/document/content"
        }
      ],
      "outputs": [
        {
          "name": "textItems",
          "targetName": "pages"
        }
      ]
    }
  ],
  "cognitiveServices": null,
  "knowledgeStore": null,
  "indexProjections": {
    "selectors": [
      {
        "targetIndexName": "<myindex>",
        "parentKeyFieldName": "parent_id",
        "sourceContext": "/document/pages/*",
        "mappings": [
          {
            "name": "chunk",
            "source": "/document/pages/*",
            "sourceContext": null,
            "inputs": []
          },
          {
            "name": "vector",
            "source": "/document/pages/*/vector",
            "sourceContext": null,
            "inputs": []
          },
          {
            "name": "doc_name",
            "source": "/document/metadata_storage_name",
            "sourceContext": null,
            "inputs": []
          },
          {
            "name": "title",
            "source": "/document/title",
            "sourceContext": null,
            "inputs": []
          },
          {
            "name": "source",
            "source": "/document/owner",
            "sourceContext": null,
            "inputs": []
          }
        ]
      }
    ],
    "parameters": {
      "projectionMode": "skipIndexingParentDocuments"
    }
  },
  "encryptionKey": null
}

It is just warnings and I think I do understand why they are shown but it would be great if there id a way to suppress them or work around them.

Any idea or insight is much appreciated

0

There are 0 answers