Does Zarr has built-in multi-threading support for fast read and write?

967 views Asked by At

I am trying to speed up reading and writing Zarr files using multi-threading. For example, if I can store an array in 5 chunks, is there a way to use a thread per chunk to speed up reading and writing the array to and from disk (possibly using ThreadSynchronizer() and synchronizer argument?). I just want to speed up read/write. I don't want to parallelize the computation. I know that can be done with dask. Thanks.

2

There are 2 answers

0
Maxime A On

Zarr supports parallel reads and writes to different chunks without any additional synchronization, see documentation. As pointed by the documentation, the bottleneck may be the GIL, in which case using multiple processes (rather than threads) can help speedup the processing.

0
alexod On

Zarr has not, but dask has. So you can dome something like this.

import dask.array as da

zarr_url = "s3://somewhere..."

x = da.from_zarr(zarr_url).compute()