Efficiently writing/reading an array of '1' and '-1's to a binary file

99 views Asked by At

I am a computational-physics graduate student and my research requires me to write a large array storing the values of '1' and '-1' to a binary file(s). Currently I have come up with the following MWE:

#include <fstream>
#include <sstream>
#include <bitset>

const int Num = 1024;

std::string int_array_to_string(int state[], int start, int finish){
    std::ostringstream oss("");
    for (int i=start; i<start+finish; i++)
        switch(state[i]){
            case -1: oss << 0; break;
            case  1: oss << 1; break;
        }
    return oss.str();
}
void printToBinary(int state[], std::ostream &output){
    for (int i=0; i<Num; i+=32){
        std::bitset<32> x( int_array_to_string(state, i, 32));
        unsigned long n = x.to_ulong();
        output.write(reinterpret_cast<const char*>(&n), sizeof(n));
    }
}
void fakeUpSomeData(int state[]){
    int ans = 1;
    for (int i=0; i<Num; i++){
        ans *= -1;
        state[i] = ans;
    }
}
int main(void){
    int state[Num] = {0};
    fakeUpSomeData(state);

    std::ofstream output("output.bin", std::ios::binary);

    printToBinary(state, output);

    return 0;
}

This however, makes my program run three times slower than before and I'm certain there must be a better way to do this.

Additionally it would be useful to be able to register chunks of the data later, that is if I store the three states

{1,-1,1}
{1,-1,1}
{1,1,-1}

into one file it would be useful if a method exists to read the first chunk, then the second chunk, then the third chunk.

A bit of background/reasoning behind why I need to do this: I will need to store roughly 1024*1e5 up to 9632*1e6 of these ints to calculate low/high resolution predictions for neutron scattering. So being able to read out chunks of some size 'N' would be extremely useful instead of storing 1e6 separate binary files in a folder (just typing that option sounds ridiculous!).

Finally I have considered using the package HDF5 but it seems a bit overkill, and I was unable to get a MWE to work using it.

Any thoughts on how to improve the MWE would be appreciated and thank you for your time.

1

There are 1 answers

1
Ian Dingle On BEST ANSWER

Check out this answer: Writing a binary file in C++ very fast

In summary, try using C Style I/O, that is forget about output streams and use open() and write() to write directly to the file descriptors.

You could even use read() with a buffer size the same number of bytes needed to store your NxN binary states in a single chunk andread them in one at a time.