Context
I have download u- and v- wind components from: ERA5 reanalysis ("reanalysis-era5-pressure-levels"
, variable observation
) and ECMWF's Seasonal Forecast ("seasonal-original-pressure-levels"
, variable forecast
) for the same year (2021), and I would like to use Continuous Ranked Probability Score (CRPS) from properscoring to evaluate the performance of the forecast.
For evaluation, the idea is to use xr.apply_ufunc
over number
(members) to calculate, for every: time, latitude and longitude; the CRPS, something like:
import properscoring as ps
crps = xr.apply_ufunc(
ps.crps_ensemble,
observation,
forecast,
input_core_dims=[[], ["number"]],
output_core_dims=[[]],
vectorize=True,
)
However, observation
is indexed by ("time", "latitude", "longitude")
, while forecast
is indexed by ("time", "latitude", "longitude", "steps", "number")
; making them not compatible (xr.align(..., join = "exact")
fails because of both time
and steps
).
How can I transform observations
's time
dimension into ("time", "steps")
?
What I have tried
Using backend_kwargs
For another project I used xr.open_dataset(..., backend_kwargs={"time_dims": ("valid_time",)})
, which made valid_time
match with time
dimension and things worked out. However, for this current project, there is overlap between valid_time
and time
with steps
(e.g. time=2021-01-01 + steps=36:00:00 > time=2021-01-02
), which makes this solution undesirable, as values are lost.
forecast = xr.open_dataset(
"data/raw/forecast.grib",
engine="cfgrib",
backend_kwargs=dict(time_dims=("valid_time",)),
)
# forecast is missing values corresponding to `steps > 24:00:00`
Concatenating into a new dimension
Another solution that I tried implementing is concatenating this rolling window as different steps: for each day, extract the next 36 hours as steps into a new dimension, steps
. However, this did neither create a new dimension (maybe xr.concat
is not the correct function?) nor was fast enough (took more than 6 minutes for two months).
xr.concat(
[
observation.sel(time = slice(date, date + pd.Timedelta("36H")))
for date in observation.time.values
],
dim = "steps"
)