Transform a Pandas series to be monotonic

1.2k views Asked by At

I'm looking for a way to remove the points that ruin the monotonicity of a series.

For example

s = pd.Series([0,1,2,3,10,4,5,6])

or

s = pd.Series([0,1,2,3,-1,4,5,6])

we would extract

s = pd.Series([0,1,2,3,4,5,6])

NB: we assume that the first element is always correct.

2

There are 2 answers

5
M. Abreu On BEST ANSWER

Monotonic could be both increasing or decreasing, the functions below will return exclude all values that brean monotonicity.

However, there seems to be a confusion in your question, given the series s = pd.Series([0,1,2,3,10,4,5,6]), 10 doesn't break monotonicity conditions, 4, 5, 6 do. So the correct answer there is 0, 1, 2, 3, 10

import pandas as pd

s = pd.Series([0,1,2,3,10,4,5,6])

def to_monotonic_inc(s):
    return s[s >= s.cummax()]

def to_monotonic_dec(s):
    return s[s <= s.cummin()]

print(to_monotonic_inc(s))
print(to_monotonic_dec(s))

Output is 0, 1, 2, 3, 10 for increasing and 0 for decreasing.

Perhaps you want to find the longest monotonic array? because that's a completely different search problem.

----- EDIT -----

Below is a simple way of finding the longest monotonic ascending array given your constraints using plain python:

def get_longeset_monotonic_asc(s):
    enumerated = sorted([(v, i) for i, v in enumerate(s) if v >= s[0]])[1:]
    output = [s[0]]
    last_index = 0
    for v, i in enumerated:
        if i > last_index:
            last_index = i
            output.append(v)

    return output

s1 = [0,1,2,3,10,4,5,6]
s2 = [0,1,2,3,-1,4,5,6]

print(get_longeset_monotonic_asc(s1))
print(get_longeset_monotonic_asc(s2))

'''
Output:

[0, 1, 2, 3, 4, 5, 6]
[0, 1, 2, 3, 4, 5, 6]

'''

Note that this solution involves sorting which is O(nlog(n)) + a second step which is O(n).

0
jsmart On

Here is a way to produce a monotonically increasing series:

import pandas as pd

# create data
s = pd.Series([1, 2, 3, 4, 5, 4, 3, 2, 3, 4, 5, 6, 7, 8])

# find max so far (i.e., running_max)
df = pd.concat([s.rename('orig'), 
                s.cummax().rename('running_max'),
               ], axis=1)

# are we at or above max so far?
df['keep?'] = (df['orig'] >= df['running_max'])

# filter out one or many points below max so far
df = df.loc[ df['keep?'], 'orig']

# verify that remaining points are monotonically increasing
assert pd.Index(df).is_monotonic_increasing

# print(df.drop_duplicates()) # eliminates ties
print(df)                     # keeps ties

0     1
1     2
2     3
3     4
4     5
10    5 # <-- same as previous value -- a tie
11    6
12    7
13    8
Name: orig, dtype: int64

You can see graphically with s.plot(); and df.plot();