ndarray to structured_array and float to int

Question

ndarray to structured_array and float to int

1.6k views Asked by Touki At 23 December 2013 at 15:59

The problem I encounter is that, by using ndarray.view(np.dtype) to get a structured array from a classic ndarray seems to miscompute the float to int conversion.

Example talks better:

In [12]: B
Out[12]: 
array([[  1.00000000e+00,   1.00000000e+00,   0.00000000e+00,
      0.00000000e+00,   4.43600000e+01,   0.00000000e+00],
   [  1.00000000e+00,   2.00000000e+00,   7.10000000e+00,
      1.10000000e+00,   4.43600000e+01,   1.32110000e+02],
   [  1.00000000e+00,   3.00000000e+00,   9.70000000e+00,
      2.10000000e+00,   4.43600000e+01,   2.04660000e+02],
   ..., 
   [  1.28900000e+03,   1.28700000e+03,   0.00000000e+00,
      9.99999000e+05,   4.75600000e+01,   3.55374000e+03],
   [  1.28900000e+03,   1.28800000e+03,   1.29000000e+01,
      5.40000000e+00,   4.19200000e+01,   2.08400000e+02],
   [  1.28900000e+03,   1.28900000e+03,   0.00000000e+00,
      0.00000000e+00,   4.19200000e+01,   0.00000000e+00]])

In [14]: B.view(A.dtype)
Out[14]: 
array([(4607182418800017408, 4607182418800017408, 0.0, 0.0, 44.36, 0.0),
   (4607182418800017408, 4611686018427387904, 7.1, 1.1, 44.36, 132.11),
   (4607182418800017408, 4613937818241073152, 9.7, 2.1, 44.36, 204.66),
   ...,
   (4653383897399164928, 4653375101306142720, 0.0, 999999.0, 47.56, 3553.74),
   (4653383897399164928, 4653379499352653824, 12.9, 5.4, 41.92, 208.4),
   (4653383897399164928, 4653383897399164928, 0.0, 0.0, 41.92, 0.0)], 
  dtype=[('i', '<i8'), ('j', '<i8'), ('tnvtc', '<f8'), ('tvtc', '<f8'), ('tf', '<f8'), ('tvps', '<f8')])

The 'i' and 'j' columns are true integers:

Here you have two further check I have done, the problem seems to come from the ndarray.view(np.int)

In [21]: B[:,:2]
Out[21]: 
array([[  1.00000000e+00,   1.00000000e+00],
   [  1.00000000e+00,   2.00000000e+00],
   [  1.00000000e+00,   3.00000000e+00],
   ..., 
   [  1.28900000e+03,   1.28700000e+03],
   [  1.28900000e+03,   1.28800000e+03],
   [  1.28900000e+03,   1.28900000e+03]])

In [22]: B[:,:2].view(np.int)
Out[22]: 
array([[4607182418800017408, 4607182418800017408],
   [4607182418800017408, 4611686018427387904],
   [4607182418800017408, 4613937818241073152],
   ..., 
   [4653383897399164928, 4653375101306142720],
   [4653383897399164928, 4653379499352653824],
   [4653383897399164928, 4653383897399164928]])

In [23]: B[:,:2].astype(np.int)
Out[23]: 
array([[   1,    1],
   [   1,    2],
   [   1,    3],
   ..., 
   [1289, 1287],
   [1289, 1288],
   [1289, 1289]])

What am I doing wrong? Can't I change the type due to numpy allocation memory? Is there another way to do this (fromarrays, was accusing a shape mismatch ?

Original Q&A

There are 3 answers

Touki On 23 December 2013 at 16:25

Actually, from_arrays work, but it doesn't explain this weird comportment.

Here is the solution I've found:

np.core.records.fromarrays(B.T, dtype=A.dtype)

Sklavit On 17 February 2016 at 13:30

The only solution which worked for me in similar situation:

np.array([tuple(row) for row in B], dtype=A.dtype)

**Joe Kington** · Accepted Answer · 2013-12-23T16:28:48+00:00

This is the difference between doing somearray.view(new_dtype) and calling astype.

What you're seeing is exactly the expected behavior, and it's very deliberate, but it's uprising the first time you come across it.

A view with a different dtype interprets the underlying memory buffer of the array as the given dtype. No copies are made. It's very powerful, but you have to understand what you're doing.

A key thing to remember is that calling view never alters the underlying memory buffer, just the way that it's viewed by numpy (e.g. dtype, shape, strides). Therefore, view deliberately avoids altering the data to the new type and instead just interprets the "old bits" as the new dtype.

For example:

In [1]: import numpy as np

In [2]: x = np.arange(10)

In [3]: x
Out[3]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [4]: x.dtype
Out[4]: dtype('int64')

In [5]: x.view(np.int32)
Out[5]: array([0, 0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, 9, 0],
              dtype=int32)

In [6]: x.view(np.float64)
Out[6]:
array([  0.00000000e+000,   4.94065646e-324,   9.88131292e-324,
         1.48219694e-323,   1.97626258e-323,   2.47032823e-323,
         2.96439388e-323,   3.45845952e-323,   3.95252517e-323,
         4.44659081e-323])

If you want to make a copy of the array with a new dtype, use astype instead:

In [7]: x
Out[7]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [8]: x.astype(np.int32)
Out[8]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)

In [9]: x.astype(float)
Out[9]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

However, using astype with structured arrays will probably surprise you. Structured arrays treat each element of the input as a C-like struct. Therefore, if you call astype, you'll run into several suprises.

Basically, you want the columns to have a different dtype. In that case, don't put them in the same array. Numpy arrays are expected to be homogenous. Structured arrays are handy in certain cases, but they're probably not what you want if you're looking for something to handle separate columns of data. Just use each column as its own array.

Better yet, if you're working with tabular data, you'll probably find its easier to use pandas than to use numpy arrays directly. pandas is oriented towards tabular data (where columns are expected to have different types), while numpy is oriented towards homogenous arrays.

TechQA.

ndarray to structured_array and float to int

There are 3 answers

Related Questions in PYTHON

Related Questions in NUMPY

Related Questions in MULTIDIMENSIONAL-ARRAY

Related Questions in STRUCTURED-ARRAY

Popular Questions

Popular Tags

Trending Questions