Find the longest notnull segment in a ndarray using numpy

35 views Asked by At

I have an array ab of shape (2,12)

ab = np.array([[0,3,6,3,np.nan,3,7,3,5,4,3,np.nan],
      [5,9,np.nan,3,7,5,3,6,4,np.nan,np.nan,np.nan]])

I am trying to get the longest segment of consecutive notnull values between the two rows. From the example above, the output should be:

[[3. 7. 3. 5.]
 [5. 3. 6. 4.]]

I used the solution proposed for a similar question here: Find longest subsequence without NaN values in set of series, after converting my array into a dataframe:

df = pd.DataFrame(ab.T)
seq = np.array(df.dropna(how='any').index)
longest_seq = max(np.split(seq, np.where(np.diff(seq)!=1)[0]+1), key=len)
print(df.iloc[longest_seq])

    0    1
5  3.0  5.0
6  7.0  3.0
7  3.0  6.0
8  5.0  4.0

However, is it possible to find a solution using numpy only?

Thanks

1

There are 1 answers

0
MayeulC On

I am not sure your code handles the case where the length of such sequences differs from one row to the other. Instead, I would proceed row-by-row:

res = []
for array in ab:
    # First, let's prepend a nan for regularity:
    arr = np.append(np.nan, array)
    nanindexes = np.nonzero(np.isnan(arr))[0]
    longest = max(np.split(arr, nanindexes), key=len) # select the biggest slice, they all start with nan
    longest = longest[1:] # remove the nan we added, or the starting one
    res.append(longest)

print(res)
[array([3., 7., 3., 5., 4., 3.]), array([3., 7., 5., 3., 6., 4.])]

I am not too familiar with numpy, so I took your question as an exercise. There are probably many ways to improve that code.