Is there a way to have column names in a HDF5 dataset (created by h5py
) without having to use a numpy.ndarray
as data structure?
I am thinking about something like the following for a dataset with N
rows and M
columns:
with h5py.File("foo.h5py", "w") as f:
dset = f.create_dataset("bar", (N, M), dtype='int8', ...)
# Access columns via (of course, having defined the names somewhere before)
dset['col0'] = ... # equivalent to dset[:,0]
So: there would need to be a way to pass the column names to the dataset and specify an axis they belong to.
Is that possible? I am basically looking for a functionality similar to those of a structured numpy array, but as a hdf5-native data type (i.e. without the numpy array).
(It's probably something obvious, but I am a bit stuck...)
Context: I would like to be able to easily resize the dataset, and with h5py this is really easy: e.g. dset.resize(num_columns + 1, axis=1)
for adding a new column.
With a structured numpy
array, it would not be so easy (I would e.g. need append_fields(...)
from np.lib.recfunctions
and quite some logic to add a column), which is why I would like to avoid this approach.