I am learning about memory pool, and trying to utilize boost::pool_allocator in my project. According to the documentation, I made a small test about time cost:
template <typename Alloc>
void test()
{
using namespace std::chrono;
auto t0 = high_resolution_clock::now();
for (int i = 0; i < 1000; ++i) {
std::vector<int, Alloc> vec;
for (int j = 0; j < 10000; ++j)
vec.push_back(i + j);
}
auto t1 = high_resolution_clock::now();
auto time_ms = duration<double>(t1 - t0).count() * 1e3;
cout << "time cost: " << time_ms << " ms" << endl;
}
int main()
{
test<std::allocator<int>>();
test<boost::pool_allocator<int>>();
}
And the result is:
time cost: 3.97602 ms
time cost: 91.3943 ms
The Boost doc says:
Pools are generally used when there is a lot of allocation and deallocation of small objects.
So I expect boost::pool_allocator cost less time than std::allocator in the code above, but the test result shows it's much worse.
Am I using boost::pool_allocator wrong? In what situation can I obtain a speedup by using memory pool (or just Boost pool/pool_allocator)?
For reference:
I thought maybe the locking was to blame. Also, there could be better hinting.
Let's test! Live On Compiler Explorer
Still, clearly, always slower. Let's profile
Profiler
Comparing just standard allocator to
boost::pool_allocator<int, boost::default_user_allocator_new_delete, std::mutex, 1000u, 0u>:standard allocator causes 48k calls to new/delete, resulting in as many underlying calls to
malloc/freepool allocator shows vastly reduced numbers:
and for malloc/free:
So far so good! What is taking so much time then?
Where
unordered_mallocinlines into a big number of lines from various Boost Pool headers. The top offenders are inlined fromboost/pool/simple_segregated_storage.hpp: (second column are percentage of cost relative to parent):Those lines are in
try_malloc_nWhich self-describes as:
It does appear, then that the chasing of free blocks on the segregated heap is too costly in this type of scenario after all. A little napkin calculation shows that
try_malloc_ntakes 99.75% of the higher-levelunordered_malloccall we saw earlier.SHOCK: Alternative Implementation?
During my investigations I found a number of defines that can be used to get more insight, e.g.:
Now, I used VALIDATE/INSTRUMENT observing the expected effect (very verbose output and slight performance degradation).
From reading the name/code I would have expected
BOOST_POOL_VALGRINDto similarly degrade performance (after all, it should probably do extra work to avoid false positive memory errors when running Valgrind, right?). Much to my surprise, defining it makes the whole thing run lightning fast:LiveOn Compiler ExplorerSadly looking at the details confirms that it is cheating by delegating to standard library direclty (while adding some overhead with sets of
free_list/used_listaddresses).Summary
Yes, the standard
pool/simple_segregated_storageimplementation performs badly under this load. Whether or not this is actually a bug I cannot tell with certainty, but it sure seems like it, going by the documentation (that you mentioned as well).