Monitor Azure Data Lake Store

1.1k views Asked by At

I store data in XML files in Data Lake Store within each folder, like one folder constitutes one source system.

On end of every day, i would like to run some kid of log analytics to find out how many New XML files are stored in Data Lake Store under every folder?. I have enabled Diagnostic Logs and also added OMS Log Analytics Suite.

I would like to know what is the best way to achieve this above report?

1

There are 1 answers

0
José Lara_MSFT On

It is possible to do some aggregate report (and even create an alert/notification). Using Log Analytics, you can create a query that searches for any instances when a file is written to your Azure Data Lake Store based on either a common root path, or a file naming:

AzureDiagnostics
| where ( ResourceProvider == "MICROSOFT.DATALAKESTORE" )
| where ( OperationName == "create" )
| where ( Path_s contains "/webhdfs/v1/##YOUR PATH##")

Alternatively, the last line, could also be:

| where ( Path_s contains ".xml")

...or a combination of both.

You can then use this query to create an alert that will notify you during a given interval (e.g. every 24 hours) the number of files that were created.

Depending on what you need, you can format the query these ways:

  • If you use a common file naming, you can find a match where the path contains said file naming.
  • If you use a common path, you can find a match where the patch matches the common path.
  • If you want to be notified of all the instances (not just specific ones), you can use an aggregating query, and an alert when a threshold is reached/exceeded (i.e. 1 or more events):

    AzureDiagnostics
    | where ( ResourceProvider == "MICROSOFT.DATALAKESTORE" )
    | where ( OperationName == "create" )
    | where ( Path_s contains ".xml")
    | summarize AggregatedValue = count(OperationName) by bin(TimeGenerated, 24h), OperationName
    

With the query, you can create the alert by following the steps in this blog post: https://azure.microsoft.com/en-gb/blog/control-azure-data-lake-costs-using-log-analytics-to-create-service-alerts/.

Let us know if you have more questions or need additional details.