How to count distance to the previous zero in pandas series?

Question

How to count distance to the previous zero in pandas series?

2.2k views Asked by Roman At 09 June 2015 at 11:41

I have the following pandas series (represented as a list):

[7,2,0,3,4,2,5,0,3,4]

I would like to define a new series that returns distance to the last zero. It means that I would like to have the following output:

[1,2,0,1,2,3,4,0,1,2]

How to do it in pandas in the most efficient way?

Original Q&A

There are 8 answers

**Ami Tavory** · Answer 1 · 2015-06-09T11:49:28+00:00

It's sometimes surprising to see how simple it is to get c-like speeds for this stuff using Cython. Assuming your column's .values gives arr, then:

cdef int[:, :, :] arr_view = arr
ret = np.zeros_like(arr)
cdef int[:, :, :] ret_view = ret

cdef int i, zero_count = 0
for i in range(len(ret)):
    zero_count = 0 if arr_view[i] == 0 else zero_count + 1
    ret_view[i] = zero_count

Note the use of typed memory views, which are extremely fast. You can speed it further using @cython.boundscheck(False) decorating a function using this.

**Alex Riley** · Answer 2 · 2015-06-09T12:04:53+00:00

A solution in Pandas is a little bit tricky, but could look like this (s is your Series):

>>> x = (s != 0).cumsum()
>>> y = x != x.shift()
>>> y.groupby((y != y.shift()).cumsum()).cumsum()
0    1
1    2
2    0
3    1
4    2
5    3
6    4
7    0
8    1
9    2
dtype: int64

For the last step, this uses the "itertools.groupby" recipe in the Pandas cookbook here.

**behzad.nouri** · Answer 3 · 2015-06-09T12:16:32+00:00

The complexity is O(n). What will slow it down is doing a for loop in python. If there are k zeros in the series, and log k is negligibile comparing to the length of series, an O(n log k) solution would be:

>>> izero = np.r_[-1, (ts == 0).nonzero()[0]]  # indices of zeros
>>> idx = np.arange(len(ts))
>>> idx - izero[np.searchsorted(izero - 1, idx) - 1]
array([1, 2, 0, 1, 2, 3, 4, 0, 1, 2])

**dimid** · Answer 4 · 2021-05-17T22:35:27+00:00

Another option

df = pd.DataFrame({'X': [7, 2, 0, 3, 4, 2, 5, 0, 3, 4]})
zeros = np.r_[-1, np.where(df.X == 0)[0]]

def d0(a):
    return np.min(a[a>=0])
    
df.index.to_series().apply(lambda i: d0(i - zeros))

Or using pure numpy

df = pd.DataFrame({'X': [7, 2, 0, 3, 4, 2, 5, 0, 3, 4]})
a = np.arange(len(df))[:, None] - np.r_[-1 , np.where(df.X == 0)[0]][None]

np.min(a, where=a>=0, axis=1, initial=len(df))

**ali bakhtiari** · Answer 5 · 2023-01-06T22:04:38+00:00

Maybe pandas is not the best tool for this as in the answer by @behzad.nouri, however here is another variation:

df = pd.DataFrame({'X': [7, 2, 0, 3, 4, 2, 5, 0, 3, 4]})

z = df.ne(0).X
z.groupby((z != z.shift()).cumsum()).cumsum()

0    1
1    2
2    0
3    1
4    2
5    3
6    4
7    0
8    1
9    2
Name: X, dtype: int64

Solution 2:

If you write the following code you will get almost everything you need, except that the first row starts from 0 and not 1:

df = pd.DataFrame({'X': [7, 2, 0, 3, 4, 2, 5, 0, 3, 4]})
df.eq(0).cumsum().groupby('X').cumcount()

0    0
1    1
2    0
3    1
4    2
5    3
6    4
7    0
8    1
9    2
dtype: int64

This happened because cumulative sum starts the counting from 0. To get the desired results, I added a 0 to the first row, calculated everything and then dropped the 0 at the end to get:

x = pd.Series([0], index=[0])
df = pd.concat([x, df])
df.eq(0).cumsum().groupby('X').cumcount().reset_index(drop=True).drop(0).reset_index(drop=True)

0    1
1    2
2    0
3    1
4    2
5    3
6    4
7    0
8    1
9    2
dtype: int64

**Bill** · Answer 6 · 2023-01-06T22:35:42+00:00

Yet another way to do this using Numpy accumulate. The only catch is, to initialize the counter at zero you need to insert a zero infront of the series values.

import numpy as np

# Define Python function
f = lambda a, b: 0 if b == 0 else a + 1

# Convert to Numpy ufunc
npf = np.frompyfunc(f, 2, 1)

# Apply recursively over series values
x = npf.accumulate(np.r_[0, s.values])[1:]

print(x)

array([1, 2, 0, 1, 2, 3, 4, 0, 1, 2], dtype=object)

**rhug123** · Answer 7 · 2023-01-06T22:42:20+00:00

rhug123 On 06 January 2023 at 22:42

Here is a way without using groupby:

((v:=pd.Series([7,2,0,3,4,2,5,0,3,4]).ne(0))
.cumsum()
.where(v.eq(0)).ffill().fillna(0)
.rsub(v.cumsum())
.astype(int)
.tolist())

Output:

[1, 2, 0, 1, 2, 3, 4, 0, 1, 2]

**Partha Mandal** · Answer 8 · 2021-01-08T16:57:48+00:00

A solution that may not be as performant (haven't really checked), but easier to understand in terms of the steps (at least for me), would be:


df = pd.DataFrame({'X': [7, 2, 0, 3, 4, 2, 5, 0, 3, 4]})
df

df['flag'] = np.where(df['X'] == 0, 0, 1)
df['cumsum'] = df['flag'].cumsum()
df['offset'] = df['cumsum']
df.loc[df.flag==1, 'offset'] = np.nan
df['offset'] = df['offset'].fillna(method='ffill').fillna(0).astype(int)
df['final'] = df['cumsum'] - df['offset']

df

TechQA.

How to count distance to the previous zero in pandas series?

There are 8 answers

Related Questions in PYTHON

Related Questions in NUMPY

Related Questions in PANDAS

Related Questions in SERIES

Popular Questions

Popular Tags

Trending Questions