efficiently serialise (and read) int array from nodejs

932 views Asked by At

I'm considering building an application in nodejs which would need to stream large (>GB) files containing an array of integers. Crucially the array needs to be serialised optimally, so not ascii based, ideally using 8 bits for smaller integers (which would be the vast majority of the data) but still being able to represent larger numbers.

This question is maybe about more than nodejs, but how does one go about this in nodejs? Are there readily available solutions for streaming files with custom byte encodings from disk? Or better, integer arrays?

Ideally it should be possible for the decoding of each part of the stream to be disk bound rather than cpu bound, even with an ssd.

1

There are 1 answers

0
Nat On BEST ANSWER

I feel silly for not diving into the documentation first (the purpose of this project is for me to learn nodejs after all).

Turns out the default behaviour of the File System module looks up to the job. Though I haven't implemented the variable-length quantity decoding part or tested it for speed yet.

var fs, rs, bufferSize, buffer, i;
fs = require('fs');
rs = fs.createReadStream('/Path/to/file');
bufferSize = 10;

while(true){
  buffer = rs.read(bufferSize);

  if (!buffer) break;

  for(i=0; i<buffer.length; i++;){
    byte = buffer[i];
    // interpret byte given as integer according to 'variable-length quantity' encoding
  }
}

http://en.wikipedia.org/wiki/Variable-length_quantity

EDIT: I made a gist of the fully functioning script.