Reading files into Ruby Numo::NArray

161 views Asked by At

I have given number of files, which all have the same size. What I'm trying to do is to load those files into Numo::Narray in a way that every file needs to be in a different row of this array. Number of files and their size is known before creating Narray. What I'm using now is 8-bit unsigned int.

Example: For 5 files of size 512 I would need multidimentional array of shape [5, 512]. Data should be stored in Galois fields. It's crucial as this matrix is going to be used in mathematical operations. What I'm using now to store data is binary data converted 8-bit unsigned int array. Sadly, performance of ruby's "read" and "unpack('*C')" methods is not high enough.

I have done this with old version of NArray, but the performance was not good enough, since I had to first create NMatrix of fixed size filled with zeros, load data to normal Ruby array and replace NMatrix's given row. This new library is quite large and I can't find methods that would e.g. insert row or dynamically add data to row. Do I have to declare fixed NArray or maybe there is a way to do it dynamically by loading data directly from file.read method into Narray so I don't have to create helper ruby array?

Would appreciate optimal solution as I'm interested in high performance.

1

There are 1 answers

0
binzo On BEST ANSWER

Suppose there are 512 integers stored as 8-bit unsigned integer binary data in the files data0 ~ data4. Then, you can store the 8-bit unsigned integer values from each file in each row of Numo::UInt8 array as follows,

require "numo/narray"

n, m = 5, 512

string = ""
n.times do |k|
  string.concat File.binread("data#{k}", m)
end

na = Numo::UInt8.from_binary(string, [n,m])

This code combines the string data that should be converted to NArray into a single string and reads it with Numo::Uint8.from_binary. However, when I tried it in my environment, it is rate-limiting in the file loading part, and I don't think the from_binary method is too slow.

If you are currently using HDD as a strage, you may want to consider using SSDs or other faster storage devices.