Say I have the following 2d-array
>>> import numpy as np
>>> budgets = np.array([
[np.nan, 450.],
[500. , 100.],
[np.nan, 900.],
])
whose values are positioned like so
>>> coords = [
('name' , ['Jack_teen' , 'John_adult', 'John_teen']), # over rows
('hobby', ['books', 'bicyle']), # over columns
]
Using xarray I can create a 2d labeled array, doing
>>> import xarray as xr
>>> x = xr.DataArray(budgets, coords=coords)
Thus when John was a teenager, he did not like books, which is visible if one gets its budget at that time
>>> x.sel(name='John_teen', hobby='books')
<xarray.DataArray ()>
array(nan)
Coordinates:
name |S10 'John_teen'
hobby |S6 'books'
What has changed with age
>>> x.sel(name='John_adult', hobby='books')
<xarray.DataArray ()>
array(500.0)
Coordinates:
name |S10 'John_adult'
hobby |S6 'books'
My question:
How would you do to turn this 2dl-array into a 3dl-array which considers a new dimension called age
(whose coordinates would thus be ['adult','teen'])
while simplifying the coordinates of the dimension name
?
Note that name
's coordinates are always structured with a separating underscore, I mean as NAME_AGE. Of course the object with which you start to do this is x
.
Are there xarray-builtin manners to do this ? Or at least what is the fastest/cheapest approach ?
Since we eventually want a dimension
'name'
, I'll rename the current'name'
to'name_age'
:We can construct a
MultiIndex
directly from the coordinate values and assign this as a stackedDataArray
coordinate:If you then unstack
'name_age'
, you'll get the 3-DDataArray
you want: