I have been coding C++ for many years. Most of my apps are built for solving problems in computational science/statistics/machine learning, so I have been extensively dealing with std::vectors storing large data. During these years, there's always an unconformable feeling like a splinter on the back of my head whenever I have the need for allocating temporary std::vectors inside a function. It incurs large overhead if the function is frequently called, and gets worse if the container is large or if the function runs in many threads.
My usual tactic is to write functions as functors, and set the temporary vectors as data members so they will be reused if the next call demands no greater capacity. If the memory resource is limited, I would need to manually scan the code, seek and elicit vectors of the same type elsewhere to reuse them.
For some larger apps, I often write a standalone struct VecPool that contains vector< vector<typeNeeded> >s. Temporary containers are fetched and returned during computation. In this case, I need to add new members to VecPool when new types show up, and it becomes messy or outright impossible if I want to templatize my class/function. The memory recycling efficiency is also low because containers of different types cannot be shared.
C++ standard pretty much prevents reusing a memory buffer initialized with a different type. For years I have been wanting to have a VecPool that recycles temporary std::vectors in the most parsimonious manner. Recently I finally made the determination to remove the splinter once for all.
I have the following demands:
auto x = VecPool::give<T>(size1, ...)will send me a vector or a nested vector. The innermost data type isT. For example,auto x = VecPool::give<std::pair<double, std::set<int, float>>>(3, 2, 8, 5);should yield avector<vector<vector<vector<std::pair<double, std::set<int, float>>>>>>. In total,xis a composition of 1 + 3 + 3 x 2 + 3 x 2 x 8 = 58vectors of various types. FunctionVecPool::giveshould fetch 58vectors from the pool, constructxand return it.VecPool::recall(x)will extract the 58vectors inx, and push them back to the pool in the same order of them being pulled previously.
To achieve my goal, I used std::vector<std::vector<char>> as the pool container with a little hack. The implementation has minimal violation to the C++ standard. Here are some important considerations:
Standard >= C++17.
Memory alignment. This will not be an issue since the standard requires
mallocreturn address that is at leastsizeof(size_t)byte aligned.How
std::vectoris implemented. The pool constructor shouldstatic_assertthat the layout ofvector's header must be a 3-pointer. After the pointers are converted to integersi0, i1, i2,i1 - i0must equal the vector's size times the byte size of the element type;i2 - i0must equal the vector's capacity times the byte size of the element. If these requirements are not fulfilled, throw a runtime error.When a
vector<char>in the pool is adjusted to containTbefore being dispatched, we need to adjust the 3rd pointer in thevectorheader to ensure the byte capacity is a multiple of the byte size of the element.
The following is my implementation. Here is a copy.
#include <vector>
#include <cstring>
#include <stdlib.h>
#include <time.h>
template<typename DestPtr, typename SourPtr>
inline void mmcp(DestPtr dest, SourPtr src, std::size_t Nbyte)
{
std::memcpy((void*)dest, (void*)src, Nbyte);
}
template<typename T>
struct isVector { constexpr bool operator()() { return false; } };
template<typename T, typename A>
struct isVector<std::vector<T, A>>
{
constexpr bool operator()() { return true; };
};
// =============================================================================
// VecPool is not thread-safe. Create a VecPool for each thread.
// =============================================================================
class VecPool
{
private:
std::vector<std::vector<char>> X;
// ===========================================================================
// Convert vector<T> to vector<char> that will be throw in pool.
// ===========================================================================
template<typename T>
std::vector<char> T2char(std::vector<T> &x)
{
static_assert(sizeof(x) == sizeof(std::vector<char>));
x.resize(0); // Call destructors on all elements but keep capacity.
std::vector<char> rst;
char mem[sizeof(x)]; // Start swapping.
mmcp(mem, &rst, sizeof(x));
mmcp(&rst, &x, sizeof(x));
mmcp(&x, mem, sizeof(x));
return rst;
}
// ===========================================================================
// Convert vector<char> in pool to vector<T>.
// ===========================================================================
template<typename T>
std::vector<T> char2T(std::vector<char> &x)
{
static_assert(sizeof(x) == sizeof(std::vector<T>));
std::vector<T> rst;
std::size_t mem[3];
mmcp(mem, &x, sizeof(x));
// =========================================================================
// Adjust the capacity pointer to ensure the capacity is a multiple
// of sizeof(T):
// =========================================================================
mem[2] -= (mem[2] - mem[0]) % sizeof(T);
mmcp(&x, &rst, sizeof(x));
mmcp(&rst, mem, sizeof(x));
return rst;
}
// ===========================================================================
// Dispatch core function.
// ===========================================================================
template<typename T>
std::vector<T> giveCore(std::size_t size)
{
if (X.size() == 0)
{
std::vector<T> rst(size + 1); // Always prefer a little bit larger capacity.
rst.pop_back();
return rst;
}
auto rst = char2T<T> (X.back());
rst.resize(size + 1); // Always prefer a little bit larger capacity.
rst.pop_back();
X.pop_back();
return rst;
}
// ===========================================================================
// Recall the vector of arbitrary type. The function will first clear
// the vector --- call desctructor on all elements, and then take the
// container back to pool.
// ===========================================================================
template<typename T>
void recallCore(std::vector<T> &x) // Invalidates everything related to x.
{
x.resize(0); // Call destructor on all elements.
X.emplace_back(std::vector<char>(0));
T2char(x).swap(X.back());
}
public:
VecPool()
{
std::srand(std::time(nullptr));
test0();
test1();
test2();
std::vector<std::vector<char>>(0).swap(X);
}
template <typename T, typename... Args>
auto give(std::size_t size, Args... restSizes)
{
if constexpr (sizeof...(restSizes) == 0) return giveCore<T> (size);
else
{
auto rst = giveCore< decltype( give<T>(restSizes...) ) > (size);
for (auto &x: rst) give<T>(restSizes...).swap(x);
return rst;
}
}
template <typename T>
void recall(std::vector<T> &x)
{
if constexpr (isVector<T>()())
{
for (auto i = x.rbegin(); i != x.rend(); ++i) recall(*i);
recallCore(x);
}
else recallCore(x);
}
void test0(); // Test if std::vector header is implemented as 3-pointer.
void test1(); // Test if vectors fetched from or recalled to VecPool are handled correctly.
void test2(); // Test if nested vectors can be handled correctly.
};
void VecPool::test0()
{
static_assert( sizeof(std::size_t) == sizeof(void*) );
// ===========================================================================
// Test if std::vector is implemented in the way that can be exploited.
// ===========================================================================
unsigned fullSize = std::rand() % 31 + 17;
unsigned subSize = fullSize - std::rand() % 13;
typedef std::tuple<char, char, char> tupe; // Arbitrarily selected type.
std::vector<tupe> a(fullSize); // Just for creating a vector of an arbitrary type.
a.resize(subSize); // Downsize the vector.
// ===========================================================================
// To make VecPool work, vector header must contain exactly 3 pointers,
// e.g. p1, p2, p3.
// ===========================================================================
static_assert( sizeof(a) == sizeof(std::size_t) * 3 );
auto ptr = (std::size_t*)(&a); // Read but won't write.
bool sizePointerCool = ptr[0] + sizeof(tupe) * subSize == ptr[1];
bool capacityPointerCool = ptr[0] + sizeof(tupe) * fullSize == ptr[2];
if (!sizePointerCool or !capacityPointerCool) throw std::runtime_error(
"VecPool initialization: Layout of std::vector header is unsupported.\n");
}
// ===========================================================================
// Test 1 to run during initialization. Check memory addresses and if the
// container dispatched from VecPool can call push()/emplace_back().
// ===========================================================================
void VecPool::test1()
{
std::size_t asize = std::rand() % 7 + 3;
auto a = give<float> (asize);
a.resize(asize);
void *aptr = a.data();
int bsize = std::rand() % 5 + 3;
auto b = give<std::tuple<short, short, short> >(bsize);
void *bptr = b.data();
recall(b);
recall(a);
if ( (void*)(X.back().data()) != aptr ) throw std::runtime_error(
"VecPool initialization: Last in pool is a different container.\n");
if ( (void*)(X[X.size() - 2].data()) != bptr ) throw std::runtime_error(
"VecPool initiailization: Second last in pool is a different container.\n");
typedef std::tuple<char, char, char, char, char, char, char> tupe;
int lastContainerByteCapa = X.back().capacity();
int pushTimes = lastContainerByteCapa / sizeof(tupe) + 5;
auto c = give<tupe>(0);
// =========================================================================
// Test emplace_back() beyond the vector's capacity.
// =========================================================================
for (int i = 0; i < pushTimes; ++i) c.emplace_back(tupe());
int dsize = c.capacity() - c.size() + 3;
auto d = give<double>(dsize);
auto dptr = d.data() + 1;
recall(d);
if (std::size_t(X.back().data()) + sizeof(double) != std::size_t(dptr) )
throw std::runtime_error(
"VecPool initiailization: Last in pool is a different container -- 2.\n");
}
// Find address of the first element in a vector. Used in test2().
template <typename T>
void FindAddrsOfAllVectorsInNestedVector(T &x, std::vector<std::size_t> &addrs)
{
if constexpr (!isVector<T>()()) return;
else
{
static_assert( sizeof(std::size_t) == sizeof(x.data()) );
addrs.emplace_back(std::size_t(x.data()));
for (auto &u: x) FindAddrsOfAllVectorsInNestedVector(u, addrs);
}
}
// ===========================================================================
// Test 2 to run during initialization. Test if VecPool correctly deals with
// nested vectors.
// ===========================================================================
void VecPool::test2()
{
int a[7];
for (int i = 0; i < 7; ++i) a[i] = rand() % 3 + 1;
auto x = give<float> ( // x is a 7-d vector.
a[0], a[1], a[2], a[3], a[4], a[5], a[6]);
std::vector<std::size_t> addresses;
FindAddrsOfAllVectorsInNestedVector(x, addresses);
recall(x);
for (int i = X.size() - 1, j = 0; i >= 0; --i, ++j)
{
auto addr = std::size_t(X[i].data());
if (addresses[j] != addr) throw std::runtime_error(
"VecPool initialization: Nested vector test failed.\n");
}
}
// =============================================================================
// Usage pattern goes like this:
// =============================================================================
// VecPool vp;
// auto a = vp.give<int> (3);
// auto b = vp.give<double>(5, 7);
// auto c = vp.give<std::tuple<short, short, short> >(2);
//
// ......
//
// vp.recall(c); // Recalls are not necessary but good to do.
// vp.recall(b);
// vp.recall(a);
// =============================================================================
// It is strongly recommended that the first dispatched gets recalled last,
// so the next code run will always utilize the same containers.
// =============================================================================
// ===========================================================================
// Code pattern for creating a vector of vectors of different sizes.
// ===========================================================================
// auto v = vp.give< std::vector<std::vector<int>> > (3); // A vector of vector of vectors.
// for (int i = 0; i < 3; ++i)
// {
// vp.give<std::vector<int>>(i * 2 + 1).swap(v[i]);
// for (int j = 0, jend = v[i].size(); j < jend; ++j)
// vp.give<int>(j + 5).swap(v[i][j]);
// }
I have run extensive tests and have plugged it in my projects. There has been no problem so far.
Besides the obvious problem of the code, can anyone write a standard-conformant test case that will break it, i.e. make it go wrong? If you have better ideas of achieving what I want, I would be more than happy to hear about them!
Replies to some comments:
My goal is not just about reusing memory, but really reusing vectors which have so many member functions that I like, e.g. emplace_back(). Of course it is not the most efficient function if you did not reserve space, but there are many situations where you cannot guess a sufficient capacity.
Nested vectors are indeed not the most efficient, but there are many occasions when using data cube is a massive waste of memory or downright infeasible.
Your code is super dangerous because you're
memcpying around internals ofstd::vector, which is... not safe. But your real goal appears to be simply to reuse memory rather than reusing vectors. C++ actually has built-in facilities for this: customizable allocators. What you're really wanting is a pool allocator, such as those found in boost.If, however, you want to make your code work, then what you want is for something like this to work:
And then the
std::vectorwill use thevpfor allocating and deallocating memory. You'll want to make aVecPoolobject that can maintain a list of freed memory and attempt to reuse that memory for subsequent memory allocations. Something like this:You'll also want an allocator object that tells the
std::vectorhow to interact with yourVecPool:And that's the basic functionality. You just have to make sure to always pass the
vpas the last parameter to thestd::vectorconstructors.However, the need to write
std::vector<std::vector<int, VecPoolAllocator<int>>, VecPoolAllocator<std::vector<int, VecPoolAllocator<int>>>>and similar is obnoxious for parameters and return types and such, so since you mention that nested vectors is common for your use case, you might as well make helper methods for those:And then you can pass around
pool_vector<int,2>instead, which is far easier to type and understand.Oh, and you wanted
give<T>(size_t...)methods as well? Done.http://coliru.stacked-crooked.com/a/ccfc359a1819872f
As a final note, since this is an abstract allocator, you can actually use it with other structures too.
And it'll simply reuse memory allocations between all of these, no problem.