I'm using xarray to read a large netCDF dataset which is sampled every 6 hours, and want to downsample it to daily using the mean over the day. I have chunked the dataset spatially. After I downsample the dataset, each sample point in time now has it's own chunk.
I've set up a minimal working example to demonstrate the problem.
Temp = 20 + 10 * np.random.randn(20, 10, 10) # make some temperature data
times = pd.date_range("2000-01-01", periods = 20) # daily spaced data
lon = [[i for in range(10)] for _ in range(10)]
dset = xr.Dataset({"Temp": (["time", "x", "y"], Temp)},
{"lon": (["x", "y"], lon), "lat": (["x", "y"], lon), "time": times})
dset = dset.chunk({"x": 5, "y": 5}) # Frozen({'time': (20,), 'x': (5, 5), 'y': (5, 5)})
re_dset = dset.resample(time = "5D").mean() # downsample to every 5 days
re_dset.chunks # Frozen({'time': (1, 1, 1, 1), 'x': (5, 5), 'y': (5, 5)})
The chunks of the resampled dataset should be Frozen({'time': (4,), 'x': (5, 5), 'y': (5, 5)}).
Edited to fix the mistake in the expected outcome.
I think there is just a misunderstanding on what chunks attribute is displaying.
You start with 20 days, and only one slice on the time dimension, and two slices of 5 for x and for y:
You resample by 5 days, and take the mean, so you'll have 4 slices of size 1 on the time dimension, corresponding to every 5 days, which is what you observe:
I'm not sure why you would expect to have only one slice of size 1 along time dimension? At least it should be of size 4.