Under a specific access pattern, I have found a vector of vectors (std::vector<std::vector<Datum>>) might not free all its allocated memory when it is destructed.
For example, in the following code:
#include <random>
#include <vector>
struct Datum { char payload[128]; };
int main() {
std::mt19937 gen(1234);
std::normal_distribution rand(0., N / 10.);
// Resident memory ~ 150 kB
{
std::vector<std::vector<Datum>> vv(N);
for (int i {}; i < 10000000; ++i) {
const auto n = static_cast<int>(std::abs(rand(gen))) % N;
vv[n].push_back(Datum());
}
// Resident memory ~ 1.2 GB
}
// vv is destructed, but resident memory ~ 40 MB
}
By the time vv is destructed, there is still some memory that was allocated to vv that isn't freed.
(This is a minimum working example. The real code involves partitioning radio astronomy data that ends up with a handful of highly populated partitions and a long tail - and neither the number of partitions or their occupancy can be predicted ahead of time. I have approximated the real world code with a normal distribution here; in practice the actual code often ends up with a much higher proportion of the memory retained and eventually dies with an OOM error.)
I'm assuming this is caused by memory fragmentation within the C++ allocator (?) but I don't understand why exactly. I would also like to understand how to mitigate this problem.
Edit: some additional notes
- This was compiled using GCC 12.3.0
- I can resolve the situation if I call
v.reserve(?)on each vector invvwith a sufficiently large allocation reinforcing my belief this is a memory fragmentation issue