Question
https://msdn.microsoft.com/en-us/library/dd997416(v=vs.110).aspx claims it is possible to write a custom partitioner for Parallel.ForEach
that "provid[es multiple] elements at a time". How does one do this?
Justification
I have a collection of over 750,000 tasks that I'm attempting to execute in parallel with Parallel.For
or Parallel.ForEach
. (I'm currently using For
, but a custom partitioner apparently requires ForEach
.) Total processing time runs about 30 processor minutes, which should take about 4 minutes on eight cores.
"Should" is the operative term. The workloads are highly dissimilar, to the point that just the first five require 20 minutes. This would be fine if they ran on five separate threads, but it seems the default partitioner (quite sensibly) assumes the tasks all require about the same amount of time, and therefore runs the first dozen or so on the same thread.
Since this is not the case with my workload, I would like to write a custom partitioner that hands out the first 32 tasks one at a time, the next 64 two at a time, the next 128 four at a time, and so forth. https://msdn.microsoft.com/en-us/library/dd997416(v=vs.110).aspx claims to provide an example of how to create a custom partitioner. With elisions, the page says:
Each time a partition calls
MoveNext
on the enumerator, the enumerator provides the partition with one list element.
[...]
This is an example of chunk partitioning, with each chunk consisting of one element.
So far, so good. I'm reasonably certain that's what the code says, and I certainly see how the MoveNext
calls would get one element at a time.
But then it continues with:
By providing more elements at a time, you could reduce the contention over the lock and theoretically achieve faster performance.
And now I'm confused. That text makes me want to replace the yield return
with something that starts yield return new List<KeyValuePair<long, TSource>>
. This is obviously no good, but I have no idea how to get my target partitioner. Or even how to get a relatively simple partitioner, like one that unconditionally returns two elements at a time.
Given my workload, I'm very certain I want to start "providing more elements at a time" once I get past the first few heavy-weight tasks, but I can't figure out how I might do so. How do I change that sample code to supply chunks with more than one element?
Endnotes
Note 1: I realize my "first 32 tasks" is pretty arbitrary. Once I get a custom partitioner that does about what I want, I fully intend to tweak it to see what improves things and what doesn't.
Note 2: I realize I don't actually need a custom partitioner here. Shuffling the tasks would likely also solve the poor behavior of the default partitioner.
But this is a learning project for me, and I want to learn how to write a custom partitioner. Kowtowing to the default partitioner is a different project.