High memory usage with STXXL

1.6k views Asked by At

I'm working on a project using STXXL, which I understand to be an out of core version of the C++ STL library. Currently, my program is running fine with it, but the problem I'm facing now is that when the program is running, it uses close to 2GB of memory (with a low to medium sized data set).

In my program, I'm using 25 STXXL vectors, stored in individual files on disk. As for my .stxxl file, I currently have it set to dynamically allocate the disk file (by setting the disk size to 0).

So, my question is: is there a way to explicitly get STXXL to use the hard disk as opposed to RAM? Or is this amount of memory usage to be expected when using this library?

Thanks in advance for any advice anyone can provide.

2

There are 2 answers

1
Timo Bingmann On BEST ANSWER

What bobb_the_builder says about the RAM usage of the stxxl:vector is correct.

See the following code:

#include <stxxl/vector>

int main()
{
    // create vector
    //stxxl::VECTOR_GENERATOR<int>::result vector[25];
    stxxl::VECTOR_GENERATOR<int, 1, 1, 1*1024*1024>::result vector[25];

    // fill vectors with integers
    for (size_t i = 0; i < 100 * 1024 * 1024 * 1024llu; ++i) {
        vector[i % 25].push_back(i);
    }

    return 0;
}

On Linux, the program's resident memory size grows to 27528 KiB when using and to about 1,6 GiB when using which is .

Does the Windows manager show the same? Is this maybe a STXXL bug only on Windows, or just does the task manager show different memory sizes?

4
Daniel F On

I guess you are using the STXXL::VECTOR_GENERATOR template to create the 25 stxxl::vector's you mentioned in your posting? The internal memory usage of stxxl::vector's in general depends on your individual configuration (i.e. block_size * page_size * cache_pages) as described in STXXL documentation on STXXL::VECTOR_GENERATOR. That all together sums up into the reserved internal(=main) memory consumption. As far as i know the STXXL tries to allocate as much internal memory as your containers are using (if possible) as caches depending on those template parameters.

Note: the default values for the aforementioned template parameters are:

page_size = 4 
cache_pages = 8 
block_size = 2 MiB

Which results in a total memory consumption of 25 * (2 MiB * 4 * 8) = 1600 MiB that explains a huge part of your reported 2 GB memory consumption.

(Note: Which data_type (ValueType) are stored in your STXXL vector shouldn't really matter.)