I'm opening a zarr file and then rechunking it and then writing it back out to a different zarr store. Yet when I open it back up it doesn't respect the chunk size I previously wrote. Here is the code and the output from jupyter. Any idea what I'm doing wrong here?
bathy_ds = xr.open_zarr('data/bathy_store')
bathy_ds.elevation
bathy_ds.chunk(5000).elevation
bathy_ds.chunk(5000).to_zarr('data/elevation_store')
new_ds = xr.open_zarr('data/elevation_store')
new_ds.elevation
It is reverting back to the original chunking as if I'm not fully overwriting it or changing some other setting that needs changing.
This seems to be a known issue, and there's a fair bit of discussion going on within the issue's thread and a recently merged PR.
Basically, the dataset carries the original chunking around in the
.encoding
property. So when you call the second write operation, the chunks defined inds[var].encoding['chunks']
(if present) will be used to writevar
to zarr.According to the conversation in the GH issue, the currently best solution is to manually delete the chunk encoding for the variables in question:
However, it should be noted that this seems to be an evolving situation, where it's be good to check in on the progress to adapt a final solution.
Here's a little example that showcases the issue and solution: