Find next available chunk in a memory pool

782 views Asked by At

So, I've been spending some time implementing a memory pool class in C++. Except for some minor problems along the way, it's gone fairly well. However, when I tried testing it today by allocating 1000 chunks by first using the memory pool and then comparing it to using new, I was actually getting close too three times worse performance (in nano seconds) when using the memory pool. My allocation method looks like this:

template <class T> T* MemPool<T>::allocate()
{
    Chunk<T>* tempChunk = _startChunk;

    while (tempChunk->_free == false)
    {
        if (tempChunk->_nextChunk == NULL)
            throw std::runtime_error("No available chunks");

        tempChunk = tempChunk->_nextChunk;
    }

    tempChunk->_free = false;
    return &tempChunk->object;
}

I am starting at the first chunk in the pool and doing a search through the pool's linked list until I find a free chunk, or reach the end of the pool. Now, the bigger the pool, the longer this will take as the search has an O(n) time complexity where n is the number of chunks in the pool.

So I was curious as to if anyone have any thoughts on how to improve the allocation? My initial thought was to use two linked lists instead of just the one, where one contains free chunks and the other allocated chunks. When a new chunk is to be allocated, I would simply take the first element in the first mentioned linked list and move it to the allocated linked list. As far as I can see, this would eliminate the need to do any searching when allocating, and leave only deallocating requiring a search to find the correct chunk.

Any thoughts are appreciated as this is my first time working directly with memory in this way. Thanks!

2

There are 2 answers

3
Peter On BEST ANSWER

Instead of using a hand-crafted linked list, it would probably be more effective to use a std::list (particularly if you use it with a custom allocator). Less error prone, and probably better optimised.

Using two lists will allows simplifying a lot. No need to track, in the list itself, if a chunk is free or not - since that will be specified by which list the chunk is in (all that is needed is to ensure a chunk somehow doesn't appear in both lists).

Your current implementation means you are having to walk the linked list, both when allocating and deallocating.

If the chunks are fixed size, then allocation would simply be implemented by moving the first available chunk from the free to the allocated list - no need to search. To deallocate a chunk, you would still need to find it in the allocated list, which means you would need to map a T*to an entry in the list (e.g. perform a search), but then the act of deallocation will be simply moving the entry from one list to the other.

If the chunks are variable size, you'll need to do a bit more work. Allocating would require finding a chunk that is at least the requested size when allocating. Overallocating (allocating a larger chunk than needed) would make allocation and deallocation more efficent in terms of performance, but also mean that fewer chunks can be allocated from the pool. Alternatively, break a large chunk (from the free list) in two, and place one entry on both lists (representing the allocated part, and the part left unallocated). If you do this, when deallocating, it may be desirable to merge chunks that are adjacent in memory (effectively, implement defragmentation of the free memory in the pool).

You will need to decide whether the pool can be used from multiple threads, and use appropriate synchronisation.

0
Ver Greeneyes On

Use a fixed number of size bins, and make each bin a linked list.

For instance, let's say your bins are simply the integer multiples of the system page size (usually 4KiB), and you use 1MiB chunks; then you have 1MiB/4KiB = 256 bins. If a free makes an n-page region available in a chunk, append it to bin n. When allocating an n-page region, walk through the bins from n to 256 and choose the first available chunk.

To maximize performance, associate the bins with a bitmap, then scan from bit n-1 to bit 255 to find the first set bit (count the leading or trailing zeroes using compiler intrinsics like __builtin_clz and _BitScanForward). That's still not quite O(1) due to the number of bins, but it's pretty close.

If you're worried about memory overhead, you could append each chunk only once for each bin. That is, even if a chunk has 128 1-page regions available (maximally fragmented), bin 1 will still only link to the chunk once and reuse it 128 times.

To do this you'd have to link these regions together inside each chunk, which means each chunk will also need to store a list of size bins - but this can be more memory efficient because there are only at most 256 valid offsets inside each chunk, whereas the list needs to store full pointers.

Note that either way, if you don't want the free space inside each chunk to get fragmented, you'll want a quick way to remove chunks from bins in your list - which means using doubly linked lists. Obviously that adds additional memory overhead, but it might still be preferable to doing periodic free space defragmentation on the whole list.