How to avoid extra space allocation when using standard collections and storing million items

203 views Asked by At

I'm using standard collections like List< T >, HashSet< T > and Dictionary< TKey, TValue > to store 100s thousand of items, up to million. Use case specific is that items set being huge in the beginning grows very slow and stays in memory for a long time. So the problem I'm facing (despite LOH allocation/fragmentation which is another issue) is that these collections consume a lot of memory due to their internal logic which doubles allocated memory every time it runs out of free space (actually it doubles and looks for nearest prime number). In my case, where adds are rare, keeping all that extra memory is wasteful.

Here is simplified version how I deal with List:

public static void Add_GrowSlow<T>([NotNull] this List<T> list, T item, int growStep)
{
    if (list == null) throw new ArgumentNullException(nameof(list));
    if (growStep <= 0)
        throw new ArgumentOutOfRangeException(nameof(growStep));

    var count = list.Count;
    if (list.Capacity == count)
    {
        if (count > 10000)
        {
            list.Capacity = count + growStep;
        }
    }

    list.Add(item);
}

But I don't know how to deal with HashSet/Dictionary without reflection. Could you suggest any way or collection to avoid such problem? I looked at PowerCollections but found no solution for this problem.

UPDATED: I want to clarify what kind of answer I'd like to get: the name of nuget package or link to article with source code where collections, which allow to control how they grow, are implemented. Because obvious solution for my problem is to copy source code from BCL and make those existing methods protected virtual:

class HashSet<T>
{
    // ...
    protected virtual void IncreaseCapacity() {...}
}

class Dictionary<TKey, TValue>
{
    // ...
    protected virtual void Resize() {...}
}

And I don't know why they didn't do this in BCL from the beginning. The cost of virtual call is nothing compared to reallocation and copying data which happens when collection resizes. May be I should create pull request?... :)

0

There are 0 answers