python-xarray: How to create a Dataset and assign results of an iteration to the Dataset?

31 views Asked by At

I have a for loop which is running some analysis on some data and returning some values. For boring reasons, this loop cannot easily be vectorised. I want to create a Dataset and then assign the result of the for loop to the Dataset as I iterate through.

Dataset.update

If I write some code which uses Dataset.update as follows:

import numpy as np
from xarray import Dataset, cftime_range, concat

times = cftime_range(start="2024-01-01", end="2024-01-02", freq="H")

test_xarray = Dataset(coords={"time": None, "mlt": np.arange(24)})

for time in times:
    test_for_this_time = Dataset({"x": (["time", "mlt"], np.random.random((1, 24)))},
                                 coords={"time": np.array([time]), "mlt": np.arange(24)})
    test_xarray.update(test_for_this_time)

print(test_xarray)

I get the following:

<xarray.Dataset>
Dimensions:  (time: 1, mlt: 24)
Coordinates:
  * time     (time) object 2024-01-01 00:00:00
  * mlt      (mlt) int64 0 1 2 3 4 5 6 7 8 9 ... 14 15 16 17 18 19 20 21 22 23
Data variables:
    x        (time, mlt) float64 nan nan nan nan nan nan ... nan nan nan nan nan

Dataset.merge

This is clearly not what I want, and so I tried using Dataset.merge instead of update.

import numpy as np
from xarray import Dataset, cftime_range, concat

times = cftime_range(start="2024-01-01", end="2024-01-02", freq="H")

test_xarray = Dataset(coords={"time": None, "mlt": np.arange(24)})

for time in times:
    test_for_this_time = Dataset({"x": (["time", "mlt"], np.random.random((1, 24)))},
                                 coords={"time": np.array([time]), "mlt": np.arange(24)})
    test_xarray = test_xarray.merge(test_for_this_time)

print(test_xarray)

I get the following:

<xarray.Dataset>
Dimensions:  (time: 25, mlt: 24)
Coordinates:
  * time     (time) object 2024-01-01 00:00:00 ... 2024-01-02 00:00:00
  * mlt      (mlt) int64 0 1 2 3 4 5 6 7 8 9 ... 14 15 16 17 18 19 20 21 22 23
Data variables:
    x        (time, mlt) float64 0.6399 0.6227 0.7972 ... 0.7804 0.8763 0.7198

This does do what I want, so hurrah, but I don't understand what I did wrong in the first method, which I would have expected to work.

Is this the best method?

I'm curious as to whether I'm using xarray in the best way here. I've looked through Stack Overflow and through the documentation and I can't see any examples of this sort of workflow. I've also tried with xarray.concat, but that doesn't quite seem to do what I want; it leaves the first None value in the time dimension. It might be that the method above is the best way, but if not, I would greatly appreciate any advice on how better to do it.

1

There are 1 answers

0
Markus On

I would argue that the proposed methods with Dataset.update and Dataset.merge are not ideal. It shouldn't be necessary to create a new Dataset in every iteration of the for-loop, with the sole purpose of adding new data to an existing Dataset.

In your example, the coordinates over which you iterate are known before the for-loop. Therefore, my suggestion is to create first a Dataset containing a DataArray of the correct shape but unfilled (or filled with dummy values), and then fill the values in the for-loop:

import numpy as np
from xarray import Dataset, cftime_range, concat

times = cftime_range(start="2024-01-01", end="2024-01-02", freq="H")
mlt = np.arange(24)

test_xarray = Dataset(
    {"x": (["time", "mlt"], np.empty((times.size, mlt.size)))},
    coords={"time": times, "mlt": mlt},
)

for i, time in enumerate(times):
    test_xarray.x[i] = np.random.random(mlt.size)

Would this be feasible for your application?