Appropriate formatting of NumPy dtypes for arrays within Structured Arrays

297 views Asked by At

I am trying to create a numpy structured array but I can't figure out the correct way to format my column titles/column types for arrays within arrays. I keep getting the setting an array element with a sequence message, but I can convert the list into an unstructured array without a problem so the problem is in the formatting of the dtypes in the sub-arrays.

Code

#Number of People
numOfP=5
#Array of people's ids
ids=np.array(range(0,numOfP),dtype='int64')
#People object
temp=[];
peoType=np.dtype({
    'names':
    ['id','value','ability','helpNeeded','helpOut','helpIn'],
    'formats':
    ['int64','float64','float32','float32','object','object'],
    'aligned':True
});
#Populate people with attributes
for id in ids:
    temp.append([
        #0 - id
        id,
        #1 - people's value
        sts.lognorm.rvs(.5)*100000,
        #2 - people's ability
        (1/(sts.lognorm.rvs(.99)+1)),
        #3 - help needed
        ((sts.lognorm.rvs(.99))*100),
        #4 - people helped
#This is where the problem is, if I get rid of these arrays, and the associated dtypes, there are no errors
        np.zeros(numOfP),
        #5 - people who helped you
        np.zeros(numOfP)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ])
peoType
temp
#doing np.array(temp), without the dtype works
temp=np.asarray(temp)      #doesn't change anything
temp
peo=np.array(temp,peoType) #where things break

dtype

{'names': ['id', 'value', 'ability', 'helpNeeded', 'helpOut', 'helpIn'],
 'formats': ['int64', 'float64', 'float32', 'float32', 'object', 'object'],
 'aligned': True}

Error message

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
e:\xampp\htdocs\Math2Code\cooperate.py in 
     52     ])
     53 peoType
---> 54 peo=np.array(temp,peoType)

ValueError: setting an array element with a sequence.

Contents of temp List

[[0,
  86381.14170220899,
  0.12974876676966007,
  49.537761763004056,
  array([0., 0., 0., 0., 0.]),
  array([0., 0., 0., 0., 0.])],
 [1,
  95532.94886721167,
  0.3886984384013719,
  49.9244719570076,
  array([0., 0., 0., 0., 0.]),
  array([0., 0., 0., 0., 0.])],
 [2,
  53932.09250542036,
  0.6518993291826463,
  92.72979425242384,
  array([0., 0., 0., 0., 0.]),
  array([0., 0., 0., 0., 0.])],
 [3,
  161978.14156816195,
  0.49130827569636754,
  56.44742176255372,
  array([0., 0., 0., 0., 0.]),
  array([0., 0., 0., 0., 0.])],
 [4,
  38679.21128565417,
  0.6979042712239539,
  132.35562828412765,
  array([0., 0., 0., 0., 0.]),
  array([0., 0., 0., 0., 0.])]]

Contents of temp after converted to a unstructured array

array([[0, 119297.86954924025, 0.38806815548557444, 487.4877681755314,
        array([0., 0., 0., 0., 0.]), array([0., 0., 0., 0., 0.])],
       [1, 75215.69897153028, 0.5387632600167043, 83.27487024641633,
        array([0., 0., 0., 0., 0.]), array([0., 0., 0., 0., 0.])],
       [2, 88986.345811315, 0.2533847055636237, 48.52795408229029,
        array([0., 0., 0., 0., 0.]), array([0., 0., 0., 0., 0.])],
       [3, 80539.81607335186, 0.27683829962996226, 226.25682883690638,
        array([0., 0., 0., 0., 0.]), array([0., 0., 0., 0., 0.])],
       [4, 40429.11615682778, 0.5748035151329913, 226.69671215072958,
        array([0., 0., 0., 0., 0.]), array([0., 0., 0., 0., 0.])]],
      dtype=object)

Output of the peoType np.dtype variable when used in a 2x2 np.zeros array:

Input

np.zeros(2, peoType)

Output

array([(0, 0., 0., 0., 0, 0), (0, 0., 0., 0., 0, 0)],
      dtype={'names':['id','value','ability','helpNeeded','helpOut','helpIn'], 'formats':['<i8','<f8','<f4','<f4','O','O'], 'offsets':[0,8,16,20,24,32], 'itemsize':40, 'aligned':True})

Why the rows rapped in tuples????

2

There are 2 answers

0
NaN On

Too big for a comment, but this demonstrates the tuple for input to produce the structured array. If vals is a list, then you will get an error. Sample, below is using one of your inputs.

vals = (2,
  53932.09250542036,
  0.6518993291826463,
  92.72979425242384,
  np.array([0., 0., 0., 0., 0.]),
  np.array([0., 0., 0., 0., 0.]))

dt={'names':['id','value','ability','helpNeeded','helpOut','helpIn'], 'formats':['<i8','<f8','<f4','<f4','O','O']}

a = np.asarray(vals, dtype=dt)

a
array((2,  53932.09,  0.65,  92.73, array([ 0.00,  0.00,  0.00,  0.00,  0.00]), array([ 0.00,  0.00,  0.00,  0.00,  0.00])),
      dtype=[('id', '<i8'), ('value', '<f8'), ('ability', '<f4'), ('helpNeeded', '<f4'), ('helpOut', 'O'), ('helpIn', 'O')])
2
hpaulj On

Your compound dtype:

In [33]: peoType=np.dtype({
    ...:     'names':
    ...:     ['id','value','ability','helpNeeded','helpOut','helpIn'],
    ...:     'formats':
    ...:     ['int64','float64','float32','float32','object','object'],
    ...:     'aligned':True
    ...: })

A sample structured array with that dtype:

In [34]: arr = np.zeros(2, peoType)
In [35]: arr
Out[35]: 
array([(0, 0., 0., 0., 0, 0), (0, 0., 0., 0., 0, 0)],
      dtype={'names':['id','value','ability','helpNeeded','helpOut','helpIn'], 'formats':['<i8','<f8','<f4','<f4','O','O'], 'offsets':[0,8,16,20,24,32], 'itemsize':40, 'aligned':True})
In [36]: arr['id']
Out[36]: array([0, 0])
In [37]: arr['helpOut']
Out[37]: array([0, 0], dtype=object)

() is used to mark individual records. This is a 1d array, with records, not rows and columns. The notation tries to make this clear. Operations like reshape and broadcasting don't cross that record boundary.

Make your temp list:

In [39]: array = np.array
In [40]: temp=[[0,
    ...:   86381.14170220899,
    ...:   0.12974876676966007,
    ...:   49.537761763004056,
    ...:   array([0., 0., 0., 0., 0.]),
    ...:   array([0., 0., 0., 0., 0.])],
    ...:  [1,
    ...:   95532.94886721167,
    ...:   0.3886984384013719,
    ...:   49.9244719570076,
    ...:   array([0., 0., 0., 0., 0.]),
    ...:   array([0., 0., 0., 0., 0.])],
    ...:  [2,
    ...:   53932.09250542036,
    ...:   0.6518993291826463,
    ...:   92.72979425242384,
    ...:   array([0., 0., 0., 0., 0.]),
    ...:   array([0., 0., 0., 0., 0.])],
    ...:  [3,
    ...:   161978.14156816195,
    ...:   0.49130827569636754,
    ...:   56.44742176255372,
    ...:   array([0., 0., 0., 0., 0.]),
    ...:   array([0., 0., 0., 0., 0.])],
    ...:  [4,
    ...:   38679.21128565417,
    ...:   0.6979042712239539,
    ...:   132.35562828412765,
    ...:   array([0., 0., 0., 0., 0.]),
    ...:   array([0., 0., 0., 0., 0.])]]

Make a structured array from the list - first converting it into a list of tuples, as required by structured array:

In [42]: arr = np.array([tuple(row) for row in temp], peoType)
In [43]: arr
Out[43]: 
array([(0,  86381.14170221, 0.12974876,  49.53776 , array([0., 0., 0., 0., 0.]), array([0., 0., 0., 0., 0.])),
       (1,  95532.94886721, 0.38869843,  49.924473, array([0., 0., 0., 0., 0.]), array([0., 0., 0., 0., 0.])),
       (2,  53932.09250542, 0.65189934,  92.7298  , array([0., 0., 0., 0., 0.]), array([0., 0., 0., 0., 0.])),
       (3, 161978.14156816, 0.49130827,  56.447422, array([0., 0., 0., 0., 0.]), array([0., 0., 0., 0., 0.])),
       (4,  38679.21128565, 0.6979043 , 132.35562 , array([0., 0., 0., 0., 0.]), array([0., 0., 0., 0., 0.]))],
      dtype={'names':['id','value','ability','helpNeeded','helpOut','helpIn'], 'formats':['<i8','<f8','<f4','<f4','O','O'], 'offsets':[0,8,16,20,24,32], 'itemsize':40, 'aligned':True})
In [44]: arr['helpOut']
Out[44]: 
array([array([0., 0., 0., 0., 0.]), array([0., 0., 0., 0., 0.]),
       array([0., 0., 0., 0., 0.]), array([0., 0., 0., 0., 0.]),
       array([0., 0., 0., 0., 0.])], dtype=object)

The object dtype field is a 1d array of objects - arrays.

If all those object fields contained the same size arrays, we could replace them with multi-item fields:

In [50]: dt=np.dtype({
    ...:     'names':
    ...:     ['id','value','ability','helpNeeded','helpOut','helpIn'],
    ...:     'formats':
    ...:     ['int64','float64','float32','float32','5float','5float'],
    ...:     'aligned':True
    ...: })
In [51]: arr = np.array([tuple(row) for row in temp], dt)
In [52]: arr
Out[52]: 
array([(0,  86381.14170221, 0.12974876,  49.53776 , [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]),
       (1,  95532.94886721, 0.38869843,  49.924473, [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]),
       (2,  53932.09250542, 0.65189934,  92.7298  , [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]),
       (3, 161978.14156816, 0.49130827,  56.447422, [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]),
       (4,  38679.21128565, 0.6979043 , 132.35562 , [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.])],
      dtype={'names':['id','value','ability','helpNeeded','helpOut','helpIn'], 'formats':['<i8','<f8','<f4','<f4',('<f8', (5,)),('<f8', (5,))], 'offsets':[0,8,16,20,24,64], 'itemsize':104, 'aligned':True})
In [53]: arr['helpOut']
Out[53]: 
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

Now that field produces a 2d array.