I have several Event Grid Triggers using Python on the Linux Consumption Plan that are executed when new blobs are created in Azure Storage. It's possible for more than one function instance to run simultaneously if blobs are created at or around the same time. For instance I have two event triggers that look for blobs created with the following format:
Trigger 1
"subjectBeginsWith": "/blobServices/default/containers/client1",
"subjectEndsWith": ".txt"
Trigger 2
"subjectBeginsWith": "/blobServices/default/containers/client2",
"subjectEndsWith": ".txt"
If two blobs are created at the same time, I want to limit Azure Functions to only run one application (it doesn't matter which one) at a time to prevent memory issues. This scenario is fairly rare so I'm considering using the preview setting WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT
to only allow one invocation to run at a time. Would this work or is there a better way?
I have another issue where it's possible for multiple blobs to be created at around the same time per client and I only care if one is processed per day. For example client 1 could have the following files created in a day:
client1/file1_20201024.txt
client1/file2_20201024.txt
client1/file3_20201024.txt
I only care if one file is created a day. I can create special exception handling in the code to see if the work was completed and then have the Python script return, but I'm wondering if there is a built in setting in Event Grid to handle cases like this. I.e., if three blobs are created within one minute, only create one event instead of three.
Issue 1: You should use a storage queue event handler (instead of the azure function) to push the events to the queue and then pull-up them one at the time. Note, in the case of multiple VMs, you can use a leased blob technic to control a concurrency across multiple VMs, see more details here.
Issue 2: There is no built-in a feature for your described needs in the Azure Event Grid, so it must be handled in the subscriber logic.