Clarification on running multiple async tasks in parallel with throttling

2.9k views Asked by At

EDIT: Since the Bulkhead policy needs to be wrapped with a WaitAndRetry policy, anyway...I'm leaning towards example 3 as the best solution to keep parallelism, throttling, and polly policy retrying. Just seems strange since I thought the Parallel.ForEach was for sync operations and Bulkhead would be better for async

I'm trying to run multiple async Tasks in parallel with throttling using polly AsyncBulkheadPolicy. My understanding so far is that the policy method ExecuteAsync does not itself make a call onto a thread, but is leaving that to the default TaskScheduler or someone before it. Thus, if my tasks are CPU bound in some way then I need to use Parallel.ForEach when executing tasks or Task.Run() with the ExecuteAsync method in order to schedule the tasks to background threads.

Can someone look at the examples below and clarify how they would work in terms of parallism and threadpooling?

https://github.com/App-vNext/Polly/wiki/Bulkhead - Operation: Bulkhead policy does not create it's own threads, it assumes we have already done so.

async Task DoSomething(IEnumerable<object> objects);

//Example 1:
//Simple use, but then I don't have access to retry policies from polly
Parallel.ForEach(groupedObjects, (set) =>
{
    var task = DoSomething(set);
    task.Wait();
});

//Example 2:
//Uses default TaskScheduler which may or may not run the tasks in parallel
var parallelTasks = new List<Task>();
foreach (var set in groupedObjects)
{
    var task = bulkheadPolicy.ExecuteAsync(async () => DoSomething(set));
    parallelTasks.Add(task);
};

await Task.WhenAll(parallelTasks);

//Example 3:
//seems to defeat the purpose of the bulkhead since Parallel.ForEach and
//PolicyBulkheadAsync can both do throttling...just use basic RetryPolicy
//here? 
Parallel.ForEach(groupedObjects, (set) =>
{
    var task = bulkheadPolicy.ExecuteAsync(async () => DoSomething(set));
    task.Wait();
});


//Example 4:
//Task.Run still uses the default Task scheduler and isn't any different than
//Example 2; just makes more tasks...this is my understanding.
var parallelTasks = new List<Task>();
foreach (var set in groupedObjects)
{
    var task = Task.Run(async () => await bulkheadPolicy.ExecuteAsync(async () => DoSomething(set)));
    parallelTasks.Add(task);
};

await Task.WhenAll(parallelTasks);

DoSomething is an async method doing operations on a set of objects. I'd like this to happen in parallel threads while respecting retry policies from polly and allowing for throttling.

I seem to have confused myself along the way in what exactly the functional behavior of Parallel.ForEach and using Bulkhead.ExecuteAsync does, however, when it comes to how tasks/threads are handled.

1

There are 1 answers

2
Theodor Zoulias On

You are probably right that using Parallel.ForEach defeats the purpose of the bulkhead. I think that a simple loop with a delay will do the job of feeding the bulkhead with tasks. Although I guess that in a real life example there would be a continuous stream of data, and not a predefined list or array.

using Polly;
using Polly.Bulkhead;

static async Task Main(string[] args)
{
    var groupedObjects = Enumerable.Range(0, 10)
        .Select(n => new object[] { n }); // Create 10 sets to work with
    var bulkheadPolicy = Policy
        .BulkheadAsync(3, 3); // maxParallelization, maxQueuingActions
    var parallelTasks = new List<Task>();
    foreach (var set in groupedObjects)
    {
        Console.WriteLine(@$"Scheduling, Available: {bulkheadPolicy
            .BulkheadAvailableCount}, QueueAvailable: {bulkheadPolicy
            .QueueAvailableCount}");

        // Start the task
        var task = bulkheadPolicy.ExecuteAsync(async () =>
        {
            // Await the task without capturing the context
            await DoSomethingAsync(set).ConfigureAwait(false);
        });
        parallelTasks.Add(task);
        await Task.Delay(50); // Interval between scheduling more tasks
    }

    var whenAllTasks = Task.WhenAll(parallelTasks);
    try
    {
        // Await all the tasks (await throws only one of the exceptions)
        await whenAllTasks;
    }
    catch when (whenAllTasks.IsFaulted) // It might also be canceled
    {
        // Ignore rejections, rethrow other exceptions
        whenAllTasks.Exception.Handle(ex => ex is BulkheadRejectedException);
    }
    Console.WriteLine(@$"Processed: {parallelTasks
        .Where(t => t.Status == TaskStatus.RanToCompletion).Count()}");
    Console.WriteLine($"Faulted: {parallelTasks.Where(t => t.IsFaulted).Count()}");
}

static async Task DoSomethingAsync(IEnumerable<object> set)
{
    // Pretend we are doing something with the set
    await Task.Delay(500).ConfigureAwait(false);
}

Output:

Scheduling, Available: 3, QueueAvailable: 3
Scheduling, Available: 2, QueueAvailable: 3
Scheduling, Available: 1, QueueAvailable: 3
Scheduling, Available: 0, QueueAvailable: 3
Scheduling, Available: 0, QueueAvailable: 2
Scheduling, Available: 0, QueueAvailable: 1
Scheduling, Available: 0, QueueAvailable: 0
Scheduling, Available: 0, QueueAvailable: 0
Scheduling, Available: 0, QueueAvailable: 0
Scheduling, Available: 0, QueueAvailable: 1
Processed: 7
Faulted: 3

Try it on Fiddle.


Update: A slightly more realistic version of DoSomethingAsync, that actually forces the CPU to do some real work (CPU utilization near 100% in my quad core machine).

private static async Task DoSomethingAsync(IEnumerable<object> objects)
{
    await Task.Run(() =>
    {
        long sum = 0; for (int i = 0; i < 500000000; i++) sum += i;
    }).ConfigureAwait(false);
}

This method is not running for all the data sets. It's running only for the sets that are not rejected by the bulkhead.