Suppose I have a converted a simple to column dataframe to a numpy array:
gdf.head()
>>>
rid rast
0 1 01000001000761C3ECF420013F0761C3ECF42001BF7172...
1 2 01000001000761C3ECF420013F0761C3ECF42001BF64BF...
2 3 01000001000761C3ECF420013F0761C3ECF42001BF560C...
3 4 01000001000761C3ECF420013F0761C3ECF42001BF7F25...
4 5 01000001000761C3ECF420013F0761C3ECF42001BF7172...
raster_np = gdf.to_numpy()
raster_np[0][0]
>>> array([1, '01000001000761C3E.........], dtype=object))
I've been tasked with converting the numpy array to a Zarr
file format (because of the size of the rast
values and the size of the dataframe, chunking and compression might be necessary and the new .zarr files could be utilized better on an S3/cloud storage environment, I assume). I created a simple Zarr
array like so:
z_test = z.zeros(shape=(10000, 2), chunks=(10000, 2))
z_test
>>> <zarr.core.Array (10000, 2) float64>
Now, how do I get the data in raster_np
into z_test
and retain the Zarr
attributes? Simply using z_test = raster_np
obviously doesn't work. Perhaps there is something I am misunderstanding about Zarr
. Any suggestions?
Since your initial array is of mixed type (object) you need to create the zarr array with the correct data type, and encode the data. You can use the JSON encoder from
numcodecs
You will however have better performance if you store the
rid
andraster
column as separate arrays withint
andstr
datatypes respectively, or convert the hex to another basis.