I have a np.array that is too large to store in memory (34000, 34000) hence I need PyTables to store this as an Earray. As I am constrained by memory, I broke up the matrix multiplication into piecewise multiplications which is then appended to the Earray.
Here I have a simpler example where the Earray is made up of (300, 30000) where each element is 9. I am trying to update it by inserting an entire array.
[[9. 9. 9. ... 9. 9. 9.]
[9. 9. 9. ... 9. 9. 9.]
[9. 9. 9. ... 9. 9. 9.]
...
[9. 9. 9. ... 9. 9. 9.]
[9. 9. 9. ... 9. 9. 9.]
[9. 9. 9. ... 9. 9. 9.]]
However, I need to constantly update the array elements. I realize that the Earray should work in reassignment as it has the inherited .setitems method from tables.array. Below is a simple code to illustrate how I am updating the rows.
I encountered the problem where the reassignment is not persistent at closure.
hdf5_epath = 'extendable.hdf5'
hdf5_update = tables.open_file(hdf5_epath, mode='r+')
extended_data = hdf5_update.root.data[:]
sess = tf.Session()
for each in range(len(extended_data)):
print(extended_data[each])
abc = tf.ones(34716, tf.float32)
ones = sess.run(abc)
extended_data[each] = ones
hdf5_update.close()
Am I doing something wrong, or is PyTables not meant for such a use case?
I'm not familiar with TensorFlow, so can only help with the Pytables calls in your code. Yes, you can add or update data in an EArray. I have not used the
EArray.setitems()method to modify data. There is an easier way; simply index the EArray values like you would with Numpy indexing. If you want to add data (rows) to the EArray, use the EArray.append() method. There are examples of both on the Pytables doc site. Review these references for a brief tutorial:pytables.org: Modifying data in tables
pytables.org: Appending data to an existing table
In your code,
extended_datais a Numpy array, andhdf5_update.root.data[:]points to the ondisk HDF5 EArray data. It is a copy and not a view. Modifyingextended_datadoes NOT modifyhdf5_update.root.data[:]. That is why the data isn't persistent.I created a simple example to show how this works. The code below will modify the ondisk data. Output from above will show values of
extended_dataandhdf5_update.root.data[:]are different after the EArray is modified. Ondisk data is modified. In memory data is not. Scroll down for code to create the example HDF5 file.CODE TO MODIFY HDF5 EARRAY IN PLACE:
CODE TO CREATE HDF5 USED ABOVE:
Run this to create
extendable.hdf5used above. I suggest you inspect the data with HDFView before and after running each code segment.