Conditional counting in pandas df

120 views Asked by At

I have a dataframe of stock prices:

df = pd.DataFrame([100, 101, 99, 100,105,104,106], columns=['P'])

I would like to create a counter column, that counts either if the current price is higher than the previous row's price, BUT if the current price is lower than the previous row's price, only counts again, once that price is exceeded (like a watermark). Below is the desired column:

df['counter'] = [np.nan, 1, 1, 1,2,2,3]

So the second row's price is 101 which exceeds 100, so the counter is 1, then the price drops to 99 and comes back to 100, but the counter is still 1, because we have not reached the 101 price (which is the watermark), then once we exceed 101 in row 4, with a price of 105, the counter goes to 2, then the price drops to 104 again, so we stay at 2, and then when it goes to 106 we increase the counter to 3.

3

There are 3 answers

1
Ben G On BEST ANSWER

Algorithm:

  1. Find what current maximum previously observed value was at each row (inclusive of the current row).

  2. See what the maximum previously observed value was for the preceding row.

  3. Each time a difference exists between these two values, we know that a new water mark has been hit within the current row.

  4. Calculate the cumulative sum of the number of times a new water mark has been hit.

     df["current_observed_max"] = df["p"].cummax()
     df["previous_observed_max"] = df["current_observed_max"].shift(1)
     df["is_new_watermark"] =(df["current_observed_max"] != df["previous_observed_max"]).astype(int)
     df["counter"] = df["is_new_watermark"].cumsum()
    

With this you may need to subtract 1 depending on how you would like to handle the first observed number.

1
wwnde On

Another way: Find if the row value is equal to the cummulative maximum and cumsum() to create unique groups

df['newP']=(df['P'].cummax()==df['P']).cumsum()-1

     P  newP
0  100     0
1  101     1
2   99     1
3  100     1
4  105     2
5  104     2
6  106     3
1
mozway On

A very simple and efficient method is to combine pandas.factorize and cummax:

df['counter'] = pd.factorize(df['P'].cummax())[0]

Output:

     P  counter
0  100        0
1  101        1
2   99        1
3  100        1
4  105        2
5  104        2
6  106        3