how to use xarray like pandas panel when adding new items

1.1k views Asked by At

I have converted pandas panel to xarray but cannot add new items, major axis and minor axis as easily as I can with pandas panel. The code is below:

import numpy as np

import pandas as pd

import xarray as xr


panel = pd.Panel(np.random.randn(3, 4, 5), items=['one', 'two', 'three'], 
                 major_axis=pd.date_range('1/1/2000', periods=4),
                 minor_axis=['a', 'b', 'c', 'd','e'])

if I want to add a new item for example, I can:

panel.four=pd.DataFrame(np.ones((4,5)),index=pd.date_range('1/1/2000', periods=4), columns=['a', 'b', 'c', 'd','e'])

panel.four

            a   b   c   d   e
2000-01-01  1.0 1.0 1.0 1.0 1.0

2000-01-02  1.0 1.0 1.0 1.0 1.0

2000-01-03  1.0 1.0 1.0 1.0 1.0

2000-01-04  1.0 1.0 1.0 1.0 1.0

I have difficulty in increasing the items, major/minor axis in xarray

px=panel.to_xarray()

#px gives me
<xarray.DataArray (items: 3, major_axis: 5, minor_axis: 4)>

array([[[-0.440081, -0.888226,  0.158702,  2.107577],
        [ 0.917835, -0.174557,  0.501626,  0.116761],
        [ 0.406988,  1.95184 , -1.345948,  2.960774],
        [-1.905529,  0.25793 ,  0.076162,  1.954012],
        [ 0.499675,  1.87567 , -1.698771, -1.143766]],


       [[ 0.070269, -1.151737, -0.344155, -0.506383],
        [-2.199357, -0.040909,  0.491984, -0.333431],
        [-0.113155, -0.668475,  2.366683, -0.421863],
        [-0.567336, -0.302224,  1.638386, -0.038545],
        [ 0.55067 , -0.409266, -0.27916 , -0.942144]],


       [[ 1.269171, -0.151471, -0.664072,  0.269168],
        [-0.486492,  0.59632 , -0.191977,  0.22537 ],
        [ 0.069231, -0.345793, -0.450797, -2.982   ],
        [-0.42338 , -0.849736,  0.965738, -0.544596],
        [-1.455378, -0.256441, -1.204572, -0.347749]]])

Coordinates:

  * items       (items) object 'one' 'two' 'three'

  * major_axis  (major_axis) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 ...

  * minor_axis  (minor_axis) object 'a' 'b' 'c' 'd'


#how should I add a fourth item, increase/delete major axis, minor axis?
2

There are 2 answers

0
user8585402 On

xarray assignments are not as elegant as the pandas panel. Lets say we want to add a fourth item in the data array above. Here is how it works:

four=xr.DataArray(np.ones((1,4,5)), coords=[['four'],pd.date_range('1/1/2000', periods=4),['a', 'b', 'c', 'd','e']], 
                  dims=['items','major_axis','minor_axis'])

pxc=xr.concat([px,four],dim='items')

Whether the operation is on items or major/minor axis, a similar logic prevails. For deleting use

pxc.drop(['four'], dim='items')
0
shoyer On

xarray.DataArray is based on a single NumPy array internally, so it cannot be efficiently resized or appended to. Your best option is to make a new, larger DataArray with xarray.concat.

The data structure you're probably looking if you want to add items to a pd.Panel is xarray.Dataset. These are easiest to construct from the multi-indexed DataFrame equivalent to a Panel:

# First, make a DataFrame with a MultiIndex
>>> df = panel.to_frame()

>>> df.head()
                       one       two     three
major      minor
2000-01-01 a      0.278958  0.676034 -1.544726
           b     -0.918150 -2.707339 -0.552987
           c      0.023479  0.175528 -0.817556
           d      1.798001 -0.142016  1.390834
           e      0.256575  0.265369 -1.829766

# Now, convert the DataFrame with a MultiIndex to xarray
>>> ds = df.to_xarray()

>>> ds
<xarray.Dataset>
Dimensions:  (major: 4, minor: 5)
Coordinates:
  * major    (major) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
  * minor    (minor) object 'a' 'b' 'c' 'd' 'e'
Data variables:
    one      (major, minor) float64 0.279 -0.9182 0.02348 1.798 0.2566 2.41 ...
    two      (major, minor) float64 0.676 -2.707 0.1755 -0.142 0.2654 ...
    three    (major, minor) float64 -1.545 -0.553 -0.8176 1.391 -1.83 ...

# You can assign a DataFrame if it has the right column/index names
>>> ds['four'] = pd.DataFrame(np.ones((4,5)),
...                           index=pd.date_range('1/1/2000', periods=4, name='major'),
...                           columns=pd.Index(['a', 'b', 'c', 'd', 'e'], name='minor'))

# or just pass a tuple directly:
>>> ds['five'] = (('major', 'minor'), np.zeros((4, 5)))

>>> ds
<xarray.Dataset>
Dimensions:  (major: 4, minor: 5)
Coordinates:
  * major    (major) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
  * minor    (minor) object 'a' 'b' 'c' 'd' 'e'
Data variables:
    one      (major, minor) float64 0.279 -0.9182 0.02348 1.798 0.2566 2.41 ...
    two      (major, minor) float64 0.676 -2.707 0.1755 -0.142 0.2654 ...
    three    (major, minor) float64 -1.545 -0.553 -0.8176 1.391 -1.83 ...
    four     (major, minor) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 ...
    five     (major, minor) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...

For more on transitioning from pandas.Panel to xarray, read this section in the xarray docs: http://xarray.pydata.org/en/stable/pandas.html#transitioning-from-pandas-panel-to-xarray