Speeding up finding the longest continuous sequence/run of values

157 views Asked by At

I have 30 years masked dataset containing values for heatwave observed days. I want to calculate the longest heat wave event occurrence in each grid cell. The below provided code works fine when I slice the dataset for one year and it takes 2 or 3 minutes to provide the result. However, 30 years data, even after one hour, it continues to process. Can anyone solve the problem?

ff is the xarray.DataArray which contains three dimension = 't2m' time: 10958 latitude: 27 longitude: 21

fs = hw.where(hw.rolling(time=3).count()==3).sel(time=slice('1991','2020'))
ff = fs

fs
output:
_xarray.DataArray' t2m' time: 10958 latitude: 27 longitude: 21_
_array([[[nan, nan, nan, ..., nan, nan, nan], 
          [nan, nan, nan, ..., nan, nan, nan], 
          [nan, nan, nan, ..., nan, nan, nan],

is_heatwave = fs > 0

is_heatwave
ouput:
xarray.DataArray't2m'time: 10958latitude: 27longitude: 21  
array([[[False, False, False, ..., False, False, False],  
         [False, False, False, ..., False, False, False],  
         [False, False, False, ..., False, False, False],  


tim = 0
lati = 0
longi = 0

for t in range(1, ff.shape[0]*ff.shape[1]*ff.shape[2]):
    # Add the current value with the previous time step's value
    if np.any(ff[tim, lati, longi] > 1):
        ff[tim, lati, longi] = 1

    else:
        ff[tim, lati, longi] = 0

    longi = longi + 1
    if longi == 21:
        longi = 0
        lati = lati + 1
    
    if lati == 27:
        lati = 0
        tim = tim + 1
    
    if tim == 10959:
        break


tim = 0
lati = 0
longi = 0

for t in range(1, ff.shape[0]*ff.shape[1]*ff.shape[2]):
    # Add the current value with the previous time step's value
    if np.any(ff[tim, lati, longi] == 1):
        ff[tim, lati, longi] = ff[tim, lati, longi] + ff[tim-1, lati, longi]

    else:
        ff[tim, lati, longi] = ff[tim, lati, longi]

    longi = longi + 1
    if longi == 21:
        longi = 0
        lati = lati + 1
    
    if lati == 27:
        lati = 0
        tim = tim + 1
    
    if tim == 10959:
        break

ff_max = ff.max(axis=0)
2

There are 2 answers

1
Guapi-zh On

The code is slow because of the nested loops and array manipulations. You can use vectorized operations and xarray to achieve the same result much faster. Here's an example:

import numpy as np
import xarray as xr

# Assuming 'hw' is your xarray.DataArray with the heatwave data
# Create a mask for heatwaves of length 3
hw_mask = (hw > 1).rolling(time=3).sum() == 3

# Select the desired time slice
ff = hw_mask.sel(time=slice('1991', '2020'))

# Convert the boolean mask to integers
ff = ff.astype(int)

# Calculate the maximum heatwave count over the entire time period
ff_max = ff.sum(dim='time').max()
6
JonasV On

You are looking to find the longest run of True values in your array. You can do it by using this function:

def n_longest_consecutive(ds, dim='time'):
    ds = ds.cumsum(dim=dim) - ds.cumsum(dim=dim).where(data == 0).ffill(dim=dim).fillna(0)
    return data.max(dim=dim)

Before you apply the function you do have to convert the xarray dataarray into a boolean array (Only true or false values) though. This can be done like this:

is_heatwave = fs > 0
longest_run = n_longest_consecutive(is_heatwave)