h5py: assigning or broadcasting to 2×2 column in a structured array

171 views Asked by At

I have record array with 2×2 fixed-size item, with 10 rows; thus the column is 10×2x2. I would like to assign a constant to the whole column. Numpy array will broadcast scalar value correctly, but this does not work in h5py.

import numpy as np
import h5py
dt=np.dtype([('a',('f4',(2,2)))])
# h5py array
h5a=h5py.File('/tmp/t1.h5','w')['/'].require_dataset('test',dtype=dt,shape=(10,))
# numpy for comparison
npa=np.zeros((10,),dtype=dt)


h5a['a']=np.nan 
# ValueError: changing the dtype of a 0d array is only supported if the itemsize is unchanged

npa['a']=np.nan 
# numpy: broadcasts, OK

In fact, I can't find a way to assign the column without broadcasting:

h5a['a']=np.full((10,2,2),np.nan)
# ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array

Not even one element row:

h5a['a',0]=np.full((2,2),np.nan)
# ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array

What is the problem here?

1

There are 1 answers

7
hpaulj On BEST ANSWER
In [69]: d = f.create_dataset('test', dtype=dt, shape=(3,))

We can set a like sized array:

In [90]: x=np.ones(3,dt)
In [91]: x[:]=2
In [92]: x
Out[92]: 
array([([[2., 2.], [2., 2.]],), ([[2., 2.], [2., 2.]],),
       ([[2., 2.], [2., 2.]],)], dtype=[('a', '<f4', (2, 2))])

and assign it to the dataset:

In [93]: d[:]=x
In [94]: d
Out[94]: <HDF5 dataset "test": shape (3,), type "|V16">
In [95]: d[:]
Out[95]: 
array([([[2., 2.], [2., 2.]],), ([[2., 2.], [2., 2.]],),
       ([[2., 2.], [2., 2.]],)], dtype=[('a', '<f4', (2, 2))])

We can also make a single element array with the correct dtype, and assign that:

In [116]: x=np.array((np.arange(4).reshape(2,2),),dt)
In [117]: x
Out[117]: array(([[0., 1.], [2., 3.]],), dtype=[('a', '<f4', (2, 2))])
In [118]: d[0]=x

With h5py we can index with record and field as:

In [119]: d[0,'a']
Out[119]: 
array([[0., 1.],
       [2., 3.]], dtype=float32)

Where as ndarray requires a double index as with: d[0]['a']

h5py tries to imitate ndarray indexing, but is not exactly the same. We just have to accept that.

edit

The [118] assignment can also be

In [207]: d[1,'a']=x

The dt here just as one field, but I think this should work with multiple fields. The key is that the value has to be a structured array that matches the d field specification.

I just noticed in the docs that they are trying to move away from the d[1,'a'] indexing, instead using d[1]['a']. But for assignment that doesn't seem to work - not error, just no action. I think d[1] or d['a'] is a copy, the equivalent of a advanced indexing for arrays. For a structured arrays those are view.