In the Azure portal, I created a RAG environment with a search instance, a skillset and an indexer by using the Import and Vectorize data option.
Adding document works like a charm.
When I delete (soft delete option enabled) a document from a container and start the indexer, the blobs are deleted from the container and the related documents (result of a splitting and embedding skillset) are removed from the search service
The indexer console shows warnings like these:
operation
Projection.IndexProjections.SearchIndex.<indexname>
Message
Could not generate projection from input '/document/pages/*'. Check the 'source' or 'sourceContext' property of your projection in your skillset. =$(/document/pages/*) ?map { "chunk": $(/document/pages/*), "doc_name": $(/document/metadata_storage_name), "source": $(/document/owner), "title": $(/document/title), "vector": $(/document/pages/*/vector) }
The skillset in play
{
"@odata.context": "https://<myservice>.search.windows.net/$metadata#skillsets/$entity",
"@odata.etag": "\"**********\"",
"name": "<my>-skillset",
"description": "Skillset to chunk documents and generate embeddings",
"skills": [
{
"@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
"name": "#1",
"description": null,
"context": "/document/pages/*",
"resourceUri": "https://<myresource>.openai.azure.com",
"apiKey": "<redacted>",
"deploymentId": "text-embedding-ada-002",
"inputs": [
{
"name": "text",
"source": "/document/pages/*"
}
],
"outputs": [
{
"name": "embedding",
"targetName": "vector"
}
],
"authIdentity": null
},
{
"@odata.type": "#Microsoft.Skills.Text.SplitSkill",
"name": "#2",
"description": "Split skill to chunk documents",
"context": "/document",
"defaultLanguageCode": "en",
"textSplitMode": "pages",
"maximumPageLength": 5000,
"pageOverlapLength": 1250,
"maximumPagesToTake": 0,
"inputs": [
{
"name": "text",
"source": "/document/content"
}
],
"outputs": [
{
"name": "textItems",
"targetName": "pages"
}
]
}
],
"cognitiveServices": null,
"knowledgeStore": null,
"indexProjections": {
"selectors": [
{
"targetIndexName": "<myindex>",
"parentKeyFieldName": "parent_id",
"sourceContext": "/document/pages/*",
"mappings": [
{
"name": "chunk",
"source": "/document/pages/*",
"sourceContext": null,
"inputs": []
},
{
"name": "vector",
"source": "/document/pages/*/vector",
"sourceContext": null,
"inputs": []
},
{
"name": "doc_name",
"source": "/document/metadata_storage_name",
"sourceContext": null,
"inputs": []
},
{
"name": "title",
"source": "/document/title",
"sourceContext": null,
"inputs": []
},
{
"name": "source",
"source": "/document/owner",
"sourceContext": null,
"inputs": []
}
]
}
],
"parameters": {
"projectionMode": "skipIndexingParentDocuments"
}
},
"encryptionKey": null
}
It is just warnings and I think I do understand why they are shown but it would be great if there id a way to suppress them or work around them.
Any idea or insight is much appreciated