zarr not respecting chunk size from xarray and reverting to original chunk size

Question

zarr not respecting chunk size from xarray and reverting to original chunk size

1.2k views Asked by clifgray At 10 May 2021 at 19:22

I'm opening a zarr file and then rechunking it and then writing it back out to a different zarr store. Yet when I open it back up it doesn't respect the chunk size I previously wrote. Here is the code and the output from jupyter. Any idea what I'm doing wrong here?

bathy_ds = xr.open_zarr('data/bathy_store')
bathy_ds.elevation

bathy_ds.chunk(5000).elevation

bathy_ds.chunk(5000).to_zarr('data/elevation_store')
new_ds = xr.open_zarr('data/elevation_store')
new_ds.elevation

It is reverting back to the original chunking as if I'm not fully overwriting it or changing some other setting that needs changing.

Original Q&A

There are 1 answers

**Val** · Accepted Answer · 2021-05-11T08:45:00+00:00

This seems to be a known issue, and there's a fair bit of discussion going on within the issue's thread and a recently merged PR.

Basically, the dataset carries the original chunking around in the .encoding property. So when you call the second write operation, the chunks defined in ds[var].encoding['chunks'] (if present) will be used to write var to zarr.

According to the conversation in the GH issue, the currently best solution is to manually delete the chunk encoding for the variables in question:

for var in ds:
    del ds[var].encoding['chunks']

However, it should be noted that this seems to be an evolving situation, where it's be good to check in on the progress to adapt a final solution.

Here's a little example that showcases the issue and solution:

import xarray as xr

# load data and write to initial chunking 
x = xr.tutorial.load_dataset("air_temperature")
x.chunk({"time":500, "lat":-1, "lon":-1}).to_zarr("zarr1.zarr")

# display initial chunking
xr.open_zarr("zarr1.zarr/").air

# rechunk
y = xr.open_zarr("zarr1.zarr/").chunk({"time": -1})

# display
y.air

#write w/o modifying .encoding
y.to_zarr("zarr2.zarr")

# display
xr.open_zarr("zarr2.zarr/").air

# delete encoding and store
del y.air.encoding['chunks']
y.to_zarr("zarr3.zarr")

# display
xr.open_zarr("zarr3.zarr/").air

TechQA.

zarr not respecting chunk size from xarray and reverting to original chunk size

There are 1 answers

Related Questions in PYTHON

Related Questions in PYTHON-XARRAY

Related Questions in ZARR

Popular Questions

Popular Tags

Trending Questions