How to Index the Blob Storage in Azure?

5.2k views Asked by At

How do we index the blob storage? Are there any .NET SDK available, If yes I am not able to find. What I can see is the API calls that one has to make to create Index and Indexers.

Thanks

2

There are 2 answers

5
Gaurav Mantri On

As such blob storage is not indexable. What you will need to do is make use of Azure Search service and pull the data from blob storage to an Azure Search index. That makes the blob storage data searchable.

To pull the data from Azure Blob Storage into Azure Search Service index, you will need to create a Blob Data Source and an Indexer. An indexer will be responsible for fetching the data from blobs and populating the index.

You may find this link useful for indexing blob storage using Azure Search: https://learn.microsoft.com/en-us/azure/search/search-howto-indexing-azure-blob-storage.

There's a .Net SDK available for managing Azure Search Service indexes, data sources and indexers. You can read more about it here: https://learn.microsoft.com/en-us/dotnet/api/overview/azure/search?view=azure-dotnet. Also, Azure Search team has published some samples on Github which makes use of this SDK. You can find them here: https://github.com/Azure-Samples/search-dotnet-getting-started.

0
scott_lotus On

BLOB INDEX feature in preview May 2020

Announced 4th May 2020. As with other preview features its not recommended to deploy on a production environment.

Worth noting: The key Microsoft disclaimers to using a product or feature in preview status are as follows:

  • All previews are excluded from Microsoft SLAs and Warranties

  • Previews might not include customer support from Microsoft

  • Preview might not be brought forward into General Release status

Very likely this will make it to GA IMO, clearly required, similar features available on other platforms.

Note: To populate the blob index, you define key-value tag attributes on your data, either on new data during upload or on existing data already in your storage account, GPv2 Storage Accounts only.

Posted on 4 May, 2020

Blob Index—a managed secondary index, allowing you to store multi-dimensional object attributes to describe your data objects for Azure Blob storage—is now available in preview. Built on top of blob storage, Blob Index offers consistent reliability, availability, and performance for all your workloads. Blob Index provides native object management and filtering capabilities, which allows you to categorize and find data based on attribute tags set on the data.

Manage and find data with Blob Index

As datasets get larger, finding specific related objects in a sea of data can be difficult and frustrating. Previously, clients used the ListBlobs API to retrieve 5000 lexicographical records at a time, parse through the list, and repeat until you found the blobs you wanted. Some users also resorted to managing a separate lookup table to find specific objects. These separate tables can get out-of-sync—increasing cost, complexity, and frustration. Customers should not have to worry about data organization or index table management, and instead focus on building powerful applications to grow their business.

Blob Index alleviates the data management and querying problem with support for all blob types (Block Blob, Append Blob, and Page Blob). Blob Index is exposed through a familiar blob storage endpoint and APIs, allowing you to easily store and access both your data and classification indices on the same service to reduce application complexity.

To populate the blob index, you define key-value tag attributes on your data, either on new data during upload or on existing data already in your storage account. These blob index tags are stored alongside your underlying blob data. The blob indexing engine then automatically reads the new tags, indexes them, and exposes them to a user-queryable blob index. Using the Azure portal, REST APIs, or SDKs, you can then issue a FindBlobsByTags API call specify a set of criteria. Blob storage will return a filtered result set consisting only of the blobs that met the match criteria.

The below scenario is an example of how Blob Index works:

In a storage account container with a million blobs, a user uploads a new blob “B2” with the following blob index tags: < Status = Unprocessed, Quality = 8K, Source = RAW >. The blob and its blob index tags are persisted to the storage account and the account indexing engine exposes the new blob index shortly after. Later on, an encoding application wants to find all unprocessed media files that are at least 4K resolution quality. It issues a FindBlobs API call to find all blobs that match the following criteria: < Status = Unprocessed AND Quality >= 4K AND Status == RAW>. The blob index quickly returns just blob “B2,” the sole blob out of one million blobs that matches the specified criteria. The encoding application can quickly start its processing job, saving idle compute time and money.

enter image description here

Platform feature integrations with Blob Index

Blob Index not only helps you categorize, manage, and find your blob data but also provides integrations with other Blob service features, such as Lifecycle management.

Using the new blobIndexMatch as a filter, you can move data to cooler tiers or delete data based on the tags applied to your blobs. This allows you to be more granular in your rules and only move or delete data if they match your specified criteria.

The following sample lifecycle management policy applies to block blobs in the “videofiles” container and tiers objects to archive storage after one day only if the blobs match the blob index tag of Status = ‘Processed’ and Source = ‘RAW’.

Lifecycle management rule with blobIndexMatch example.

enter image description here

Lifecycle management integration with Blob Index is just the beginning. We will be adding more integrations with other blob platform features soon!

Conditional blob operations with Blob Index tags

In REST versions 2019-10-10 and higher, most blob service APIs now support a new conditional header, x-ms-if-tags, so that the operation will only succeed if the specified blob index tags condition is met. If the condition is not met, the operation will fail, thus not modifying the blob. This functionality by Blob Index can help ensure data operations only occur on explicitly tagged blobs and can protect against inadvertent deletion or modification by multi-threaded applications.

How to get started

To enroll in the Blog Index preview, submit a request to register this feature to your subscription by running the following PowerShell or CLI commands:

Register by using PowerShell

Register-AzProviderFeature -FeatureName BlobIndex -ProviderNamespace Microsoft.Storage

Register-AzResourceProvider -ProviderNamespace Microsoft.Storage

Register by using Azure CLI

az feature register --namespace Microsoft.Storage --name BlobIndex

​az provider register --namespace 'Microsoft.Storage'

After your request is approved, any existing or new General-purpose v2 (GPv2) storage accounts in France Central and France South can leverage Blob Index’s capabilities. As with most previews, we recommend that this feature should not be used for production workloads until it reaches general availability.

Ref: https://azure.microsoft.com/en-gb/blog/manage-and-find-data-with-blob-index-for-azure-storage-now-in-preview/