Durable Functions - Is Round robin activity scheduling possible?

304 views Asked by At

I am currently working on an excel-generating service, which takes some ids as input and generates an excel file as output. To do that reliably, we've implemented it using Azure's Durable Functions with a fan-out strategy. My issue is that several users should be able to start a job, but not have to wait until the other jobs have finished processing. Take the following example:

  1. One job with 10,000 id's as input is started, spawning 1,000 activities in the workitems queue.
  2. Afterwards, a second user starts a job with just one id as input, spawning 1 activity in the workitems queue, but queued AFTER the 1,000 initial activities.
  3. The single activity will now not be processed until (almost) all 1,000 activities have been processed.

This is obviously not desired, as the second user would have to wait for a long time to complete his simple little job, even if he has no clue the first job is even started.

Is there any way to configure Durable Functions to use some sort of Round-Robin queue handling so that activity-handling will be roughly equivalently distributed between orchestrations? This seems like quite the oversight if not possible in DF although I recognize the inherent limitations using Azure queues.

I'm considering implementing it so I fan out several times with less activities instead of once with all. This would obviously alleviate my issue and allow other orchestrator's activities to run, but would also degrade the throughput and force my orchestrator to replay much more.

1

There are 1 answers

0
Chris Gillum On

All activity functions within a single Durable Functions app (or more specifically, a single task hub) are scheduled using a single work-item queue. Workers will dequeue activity messages from this work-item queue one-at-a-time and in-order but will process them in parallel.

The amount of parallelism you get for executing activity functions depends on:

  1. The number of workers you have available to process work in your task hub.
  2. The concurrency settings you have configured, which control how many activity functions a single worker will process concurrently.

With that in mind, you can expect that these backlogs will happen if your app isn't configured with enough concurrency. For example, if you have 10 VMs with a maxConcurrentActivityFunctions set to 200, then in theory you can execute 2,000 activity functions in parallel, meaning that your second user's job doesn't have to wait at all for the first user's big job to finish.

Of course, depending on how much work the activity functions do, it might not be practical to execute 200 activity functions concurrently on a single worker VM. In that case, you would need to reduce per-worker activity function concurrency and instead increase the number of worker VMs.