I have a very large collection that implements the generic IList<T>
interface and contains tens of millions of elements, and I would like to process them in parallel using PLINQ. I noticed that the overhead of parallelism is quite significant because processing each individual element is very lightweight, so I am searching for ways to chunkify the processing by splitting the IList<T>
into small segments. My goal is to have finally something like this:
IList<Person> source = GetAllPersons();
double averageAge = source
.Segmentate(1000) // Hypothetical operator that segmentates the source
.AsParallel()
.Select(segment => segment.Select(person => (double)person.CalculateAge()).Sum())
.Sum() / source.Count;
I could use the Batch
operator from the MoreLinq library, or any of the answers from many related questions, but all of these solutions are allocating multiple small arrays (or lists), and are copying the source into these containers, and I don't want that. In my case I have the additional requirement of keeping the garbage collector idle as much as possible.
I noticed that the .NET has the ArraySegment
type, that seems to fit perfectly with my requirements:
// Delimits a section of a one-dimensional array.
public readonly struct ArraySegment<T> : ICollection<T>, IEnumerable<T>,
IEnumerable, IList<T>, IReadOnlyCollection<T>, IReadOnlyList<T>
I could use this type to implement the allocation-free Segmentate
operator like this:
/// <summary>Segmentates the source array into sized segments.</summary>
public static IEnumerable<ArraySegment<T>> Segmentate<T>(this T[] source, int size)
{
for (int offset = 0; offset < source.Length; offset += size)
{
yield return new ArraySegment<T>(source, offset,
Math.Min(size, source.Length - offset));
}
}
But I can't use this type because my source is an IList<T>
and not an array. And copying it to an array is not really an option, because the source is mutated frequently. And creating new array copies all the time is against my requirements.
So I am searching for a ListSegment<T>
type, but as far as I can see it doesn't exist in .NET. Do I have to implement it myself? And if so, how? Or is any other way to segmentate an IList<T>
without causing allocations?
Clarification: My source collection is not a List<T>
. It is a custom class that implements the IList<T>
interface.
You need to implement an
ArraySegment<T>
equivalent forIList<T>
. See implementation below. For optimal performance, consider using spans instead.ListSegment<T> Struct