How to treat overlapping time from forecast steps?

89 views Asked by At

Context

I have download u- and v- wind components from: ERA5 reanalysis ("reanalysis-era5-pressure-levels", variable observation) and ECMWF's Seasonal Forecast ("seasonal-original-pressure-levels", variable forecast) for the same year (2021), and I would like to use Continuous Ranked Probability Score (CRPS) from properscoring to evaluate the performance of the forecast.

For evaluation, the idea is to use xr.apply_ufunc over number (members) to calculate, for every: time, latitude and longitude; the CRPS, something like:

import properscoring as ps

crps = xr.apply_ufunc(
    ps.crps_ensemble,
    observation,
    forecast,
    input_core_dims=[[], ["number"]],
    output_core_dims=[[]],
    vectorize=True,
)

However, observation is indexed by ("time", "latitude", "longitude"), while forecast is indexed by ("time", "latitude", "longitude", "steps", "number"); making them not compatible (xr.align(..., join = "exact") fails because of both time and steps).

How can I transform observations's time dimension into ("time", "steps")?

What I have tried

Using backend_kwargs

For another project I used xr.open_dataset(..., backend_kwargs={"time_dims": ("valid_time",)}), which made valid_time match with time dimension and things worked out. However, for this current project, there is overlap between valid_time and time with steps (e.g. time=2021-01-01 + steps=36:00:00 > time=2021-01-02), which makes this solution undesirable, as values are lost.

forecast = xr.open_dataset(
    "data/raw/forecast.grib",
    engine="cfgrib",
    backend_kwargs=dict(time_dims=("valid_time",)),
)
# forecast is missing values corresponding to `steps > 24:00:00`

Concatenating into a new dimension

Another solution that I tried implementing is concatenating this rolling window as different steps: for each day, extract the next 36 hours as steps into a new dimension, steps. However, this did neither create a new dimension (maybe xr.concat is not the correct function?) nor was fast enough (took more than 6 minutes for two months).

xr.concat(
    [
        observation.sel(time = slice(date, date + pd.Timedelta("36H")))
        for date in observation.time.values
    ],
    dim = "steps"
)
0

There are 0 answers