I have some very large matrices (let say of the order of the million rows), that I can not keep in memory, and I would need to access to subsample of this matrix in descent time (less than a minute...). I started looking at hdf5 and blaze in combination with numpy and pandas:
But I found it a bit complicated, and I am not sure if it is the best solution.
Are there other solutions?
thanks
EDIT
Here some more specifications about the kind of data I am dealing with.
- The matrices are usually sparse (< 10% or < 25% of cells with non-zero)
- The matrices are symmetric
And what I would need to do is:
- Access for reading only
- Extract rectangular sub-matrices (mostly along the diagonal, but also outside)
Did you try PyTables ? It can be very useful for very large matrix. Take a look to this SO post.