The problem I encounter is that, by using ndarray.view(np.dtype)
to get a structured array from a classic ndarray seems to miscompute the float
to int
conversion.
Example talks better:
In [12]: B
Out[12]:
array([[ 1.00000000e+00, 1.00000000e+00, 0.00000000e+00,
0.00000000e+00, 4.43600000e+01, 0.00000000e+00],
[ 1.00000000e+00, 2.00000000e+00, 7.10000000e+00,
1.10000000e+00, 4.43600000e+01, 1.32110000e+02],
[ 1.00000000e+00, 3.00000000e+00, 9.70000000e+00,
2.10000000e+00, 4.43600000e+01, 2.04660000e+02],
...,
[ 1.28900000e+03, 1.28700000e+03, 0.00000000e+00,
9.99999000e+05, 4.75600000e+01, 3.55374000e+03],
[ 1.28900000e+03, 1.28800000e+03, 1.29000000e+01,
5.40000000e+00, 4.19200000e+01, 2.08400000e+02],
[ 1.28900000e+03, 1.28900000e+03, 0.00000000e+00,
0.00000000e+00, 4.19200000e+01, 0.00000000e+00]])
In [14]: B.view(A.dtype)
Out[14]:
array([(4607182418800017408, 4607182418800017408, 0.0, 0.0, 44.36, 0.0),
(4607182418800017408, 4611686018427387904, 7.1, 1.1, 44.36, 132.11),
(4607182418800017408, 4613937818241073152, 9.7, 2.1, 44.36, 204.66),
...,
(4653383897399164928, 4653375101306142720, 0.0, 999999.0, 47.56, 3553.74),
(4653383897399164928, 4653379499352653824, 12.9, 5.4, 41.92, 208.4),
(4653383897399164928, 4653383897399164928, 0.0, 0.0, 41.92, 0.0)],
dtype=[('i', '<i8'), ('j', '<i8'), ('tnvtc', '<f8'), ('tvtc', '<f8'), ('tf', '<f8'), ('tvps', '<f8')])
The 'i' and 'j' columns are true integers:
Here you have two further check I have done, the problem seems to come from the ndarray.view(np.int)
In [21]: B[:,:2]
Out[21]:
array([[ 1.00000000e+00, 1.00000000e+00],
[ 1.00000000e+00, 2.00000000e+00],
[ 1.00000000e+00, 3.00000000e+00],
...,
[ 1.28900000e+03, 1.28700000e+03],
[ 1.28900000e+03, 1.28800000e+03],
[ 1.28900000e+03, 1.28900000e+03]])
In [22]: B[:,:2].view(np.int)
Out[22]:
array([[4607182418800017408, 4607182418800017408],
[4607182418800017408, 4611686018427387904],
[4607182418800017408, 4613937818241073152],
...,
[4653383897399164928, 4653375101306142720],
[4653383897399164928, 4653379499352653824],
[4653383897399164928, 4653383897399164928]])
In [23]: B[:,:2].astype(np.int)
Out[23]:
array([[ 1, 1],
[ 1, 2],
[ 1, 3],
...,
[1289, 1287],
[1289, 1288],
[1289, 1289]])
What am I doing wrong? Can't I change the type due to numpy allocation memory? Is there another way to do this (fromarrays, was accusing a shape mismatch
?
This is the difference between doing
somearray.view(new_dtype)
and callingastype
.What you're seeing is exactly the expected behavior, and it's very deliberate, but it's uprising the first time you come across it.
A view with a different dtype interprets the underlying memory buffer of the array as the given dtype. No copies are made. It's very powerful, but you have to understand what you're doing.
A key thing to remember is that calling
view
never alters the underlying memory buffer, just the way that it's viewed by numpy (e.g. dtype, shape, strides). Therefore,view
deliberately avoids altering the data to the new type and instead just interprets the "old bits" as the new dtype.For example:
If you want to make a copy of the array with a new dtype, use
astype
instead:However, using
astype
with structured arrays will probably surprise you. Structured arrays treat each element of the input as a C-like struct. Therefore, if you callastype
, you'll run into several suprises.Basically, you want the columns to have a different dtype. In that case, don't put them in the same array. Numpy arrays are expected to be homogenous. Structured arrays are handy in certain cases, but they're probably not what you want if you're looking for something to handle separate columns of data. Just use each column as its own array.
Better yet, if you're working with tabular data, you'll probably find its easier to use
pandas
than to use numpy arrays directly.pandas
is oriented towards tabular data (where columns are expected to have different types), while numpy is oriented towards homogenous arrays.