Suppose I have the following numpy structured array:
In [250]: x
Out[250]:
array([(22, 2, -1000000000, 2000), (22, 2, 400, 2000),
(22, 2, 804846, 2000), (44, 2, 800, 4000), (55, 5, 900, 5000),
(55, 5, 1000, 5000), (55, 5, 8900, 5000), (55, 5, 11400, 5000),
(33, 3, 14500, 3000), (33, 3, 40550, 3000), (33, 3, 40990, 3000),
(33, 3, 44400, 3000)],
dtype=[('f1', '<i4'), ('f2', '<f4'), ('f3', '<f4'), ('f4', '<i4')])
I am trying to modify a subset of the above array to a regular numpy array. It is essential for my application that no copies are created (only views).
Fields are retrieved from the above structured array by using the following function:
def fields_view(array, fields):
return array.getfield(numpy.dtype(
{name: array.dtype.fields[name] for name in fields}
))
If I am interested in fields 'f2' and 'f3', I would do the following:
In [251]: y=fields_view(x,['f2','f3'])
In [252]: y
Out [252]:
array([(2.0, -1000000000.0), (2.0, 400.0), (2.0, 804846.0), (2.0, 800.0),
(5.0, 900.0), (5.0, 1000.0), (5.0, 8900.0), (5.0, 11400.0),
(3.0, 14500.0), (3.0, 40550.0), (3.0, 40990.0), (3.0, 44400.0)],
dtype={'names':['f2','f3'], 'formats':['<f4','<f4'], 'offsets':[4,8], 'itemsize':12})
There is a way to directly get an ndarray from the 'f2' and 'f3' fields of the original structured array. However, for my application, it is necessary to build this intermediary structured array as this data subset is an attribute of a class.
I can't convert the intermediary structured array to a regular numpy array without doing a copy.
In [253]: y.view(('<f4', len(y.dtype.names)))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-54-f8fc3a40fd1b> in <module>()
----> 1 y.view(('<f4', len(y.dtype.names)))
ValueError: new type not compatible with array.
This function can also be used to convert a record array to an ndarray:
def recarr_to_ndarr(x,typ):
fields = x.dtype.names
shape = x.shape + (len(fields),)
offsets = [x.dtype.fields[name][1] for name in fields]
assert not any(np.diff(offsets, n=2))
strides = x.strides + (offsets[1] - offsets[0],)
y = np.ndarray(shape=shape, dtype=typ, buffer=x,
offset=offsets[0], strides=strides)
return y
However, I get the following error:
In [254]: recarr_to_ndarr(y,'<f4')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-65-2ebda2a39e9f> in <module>()
----> 1 recarr_to_ndarr(y,'<f4')
<ipython-input-62-8a9eea8e7512> in recarr_to_ndarr(x, typ)
8 strides = x.strides + (offsets[1] - offsets[0],)
9 y = np.ndarray(shape=shape, dtype=typ, buffer=x,
---> 10 offset=offsets[0], strides=strides)
11 return y
12
TypeError: expected a single-segment buffer object
The function works fine if I create a copy:
In [255]: recarr_to_ndarr(np.array(y),'<f4')
Out[255]:
array([[ 2.00000000e+00, -1.00000000e+09],
[ 2.00000000e+00, 4.00000000e+02],
[ 2.00000000e+00, 8.04846000e+05],
[ 2.00000000e+00, 8.00000000e+02],
[ 5.00000000e+00, 9.00000000e+02],
[ 5.00000000e+00, 1.00000000e+03],
[ 5.00000000e+00, 8.90000000e+03],
[ 5.00000000e+00, 1.14000000e+04],
[ 3.00000000e+00, 1.45000000e+04],
[ 3.00000000e+00, 4.05500000e+04],
[ 3.00000000e+00, 4.09900000e+04],
[ 3.00000000e+00, 4.44000000e+04]], dtype=float32)
There seems to be no difference between the two arrays:
In [66]: y
Out[66]:
array([(2.0, -1000000000.0), (2.0, 400.0), (2.0, 804846.0), (2.0, 800.0),
(5.0, 900.0), (5.0, 1000.0), (5.0, 8900.0), (5.0, 11400.0),
(3.0, 14500.0), (3.0, 40550.0), (3.0, 40990.0), (3.0, 44400.0)],
dtype={'names':['f2','f3'], 'formats':['<f4','<f4'], 'offsets':[4,8], 'itemsize':12})
In [67]: np.array(y)
Out[67]:
array([(2.0, -1000000000.0), (2.0, 400.0), (2.0, 804846.0), (2.0, 800.0),
(5.0, 900.0), (5.0, 1000.0), (5.0, 8900.0), (5.0, 11400.0),
(3.0, 14500.0), (3.0, 40550.0), (3.0, 40990.0), (3.0, 44400.0)],
dtype={'names':['f2','f3'], 'formats':['<f4','<f4'], 'offsets':[4,8], 'itemsize':12})
hpaulj was right in saying that the problem is that the subset of the structured array is not contiguous. Interestingly, I figured out a way to make the array subset contiguous with the following function:
The old function: