h5py: column names without using compound dtype?

1.5k views Asked by At

Is there a way to have column names in a HDF5 dataset (created by h5py) without having to use a numpy.ndarray as data structure?

I am thinking about something like the following for a dataset with N rows and M columns:

with h5py.File("foo.h5py", "w") as f:
    dset = f.create_dataset("bar", (N, M), dtype='int8', ...)

    # Access columns via (of course, having defined the names somewhere before)
    dset['col0'] = ... # equivalent to dset[:,0]

So: there would need to be a way to pass the column names to the dataset and specify an axis they belong to.

Is that possible? I am basically looking for a functionality similar to those of a structured numpy array, but as a hdf5-native data type (i.e. without the numpy array).

(It's probably something obvious, but I am a bit stuck...)


Context: I would like to be able to easily resize the dataset, and with h5py this is really easy: e.g. dset.resize(num_columns + 1, axis=1) for adding a new column.
With a structured numpy array, it would not be so easy (I would e.g. need append_fields(...) from np.lib.recfunctions and quite some logic to add a column), which is why I would like to avoid this approach.

0

There are 0 answers