How do I use xarray groupby_bins to group by a time array?

1.3k views Asked by At

I have a multidimensional data object which has one time axis. I need to bin the data according to a regular time series such as hourly or daily (to subsequently calculate correlations within each time bin and get a correlation time series). However, when I try to use groupby_bins I get TypeError: Cannot cast ufunc less input from dtype('<m8[ns]') to dtype('<m8') with casting rule 'same_kind':

# xr is xarray; pd is pandas
In [109]: C = numpy.random.randint(-2000, 2000, dtype='int16', size=(5000, 56, 20))

In [110]: D = xr.DataArray(C, dims=("time", "scanpos", "channel"), coords={"time": pd.date_range("2000-01-01T00:00:00", periods=5000, freq='1min')})

In [111]: D.groupby_bins("time", pd.date_range(*D["time"].data[[0,-1]], freq="1H"))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-111-7e7cda1ad060> in <module>()
----> 1 D.groupby_bins("time", pd.date_range(*D["time"].data[[0,-1]], freq="1H"))

/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/xarray/core/common.py in groupby_bins(self, group, bins, right, labels, precision, include_lowest, squeeze)
    397                                 cut_kwargs={'right': right, 'labels': labels,
    398                                             'precision': precision,
--> 399                                             'include_lowest': include_lowest})
    400 
    401     def rolling(self, min_periods=None, center=False, **windows):

/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/xarray/core/groupby.py in __init__(self, obj, group, squeeze, grouper, bins, cut_kwargs)
    190             raise TypeError("Can't specify both `grouper` and `bins`.")
    191         if bins is not None:
--> 192             binned = pd.cut(group.values, bins, **cut_kwargs)
    193             new_dim_name = group.name + '_bins'
    194             group = DataArray(binned, group.coords, name=new_dim_name)

/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/pandas/tools/tile.py in cut(x, bins, right, labels, retbins, precision, include_lowest)
    112     else:
    113         bins = np.asarray(bins)
--> 114         if (np.diff(bins) < 0).any():
    115             raise ValueError('bins must increase monotonically.')
    116 

TypeError: Cannot cast ufunc less input from dtype('<m8[ns]') to dtype('<m8') with casting rule 'same_kind'

How can I use a time axis with xarrays groupby_bins? I tried using time axes of matching dtypes but passing dtype to pd.date_range appears to have no effect, and even when the dtypes are identical (not sure why they aren't in this toy example, but that's a different question) the error remains.


P.S. I am also happy with a solution that bypasses pd.date_range completely.

1

There are 1 answers

2
shoyer On

groupby_bins was intended for numeric data, though there's no inherent reason why it shouldn't work for dates (this is indeed a little confusing). The easiest way to solve your problem of binning dates is to use the resample method:

D.resample("time", "1H")