I have 30 years masked dataset containing values for heatwave observed days. I want to calculate the longest heat wave event occurrence in each grid cell. The below provided code works fine when I slice the dataset for one year and it takes 2 or 3 minutes to provide the result. However, 30 years data, even after one hour, it continues to process. Can anyone solve the problem?
ff is the xarray.DataArray which contains three dimension = 't2m' time: 10958 latitude: 27 longitude: 21
fs = hw.where(hw.rolling(time=3).count()==3).sel(time=slice('1991','2020'))
ff = fs
fs
output:
_xarray.DataArray' t2m' time: 10958 latitude: 27 longitude: 21_
_array([[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
is_heatwave = fs > 0
is_heatwave
ouput:
xarray.DataArray't2m'time: 10958latitude: 27longitude: 21
array([[[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
tim = 0
lati = 0
longi = 0
for t in range(1, ff.shape[0]*ff.shape[1]*ff.shape[2]):
# Add the current value with the previous time step's value
if np.any(ff[tim, lati, longi] > 1):
ff[tim, lati, longi] = 1
else:
ff[tim, lati, longi] = 0
longi = longi + 1
if longi == 21:
longi = 0
lati = lati + 1
if lati == 27:
lati = 0
tim = tim + 1
if tim == 10959:
break
tim = 0
lati = 0
longi = 0
for t in range(1, ff.shape[0]*ff.shape[1]*ff.shape[2]):
# Add the current value with the previous time step's value
if np.any(ff[tim, lati, longi] == 1):
ff[tim, lati, longi] = ff[tim, lati, longi] + ff[tim-1, lati, longi]
else:
ff[tim, lati, longi] = ff[tim, lati, longi]
longi = longi + 1
if longi == 21:
longi = 0
lati = lati + 1
if lati == 27:
lati = 0
tim = tim + 1
if tim == 10959:
break
ff_max = ff.max(axis=0)
The code is slow because of the nested loops and array manipulations. You can use vectorized operations and xarray to achieve the same result much faster. Here's an example: