Pythonic way to replace list values with upper and lower bound (clamping, clipping, thresholding)?

9.1k views Asked by At

I want to replace outliners from a list. Therefore I define a upper and lower bound. Now every value above upper_bound and under lower_bound is replaced with the bound value. My approach was to do this in two steps using a numpy array.

Now I wonder if it's possible to do this in one step, as I guess it could improve performance and readability.

Is there a shorter way to do this?

import numpy as np

lowerBound, upperBound = 3, 7

arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

arr[arr > upperBound] = upperBound
arr[arr < lowerBound] = lowerBound

# [3 3 3 3 4 5 6 7 7 7]
print(arr)

See How can I clamp (clip, restrict) a number to some range? for clamping individual values, including non-Numpy approaches.

2

There are 2 answers

5
arthur On BEST ANSWER

You can use numpy.clip:

In [1]: import numpy as np

In [2]: arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [3]: lowerBound, upperBound = 3, 7

In [4]: np.clip(arr, lowerBound, upperBound, out=arr)
Out[4]: array([3, 3, 3, 3, 4, 5, 6, 7, 7, 7])

In [5]: arr
Out[5]: array([3, 3, 3, 3, 4, 5, 6, 7, 7, 7])
5
mathmandan On

For an alternative that doesn't rely on numpy, you could always do

arr = [max(lower_bound, min(x, upper_bound)) for x in arr]

If you just wanted to set an upper bound, you could of course write arr = [min(x, upper_bound) for x in arr]. Or similarly if you just wanted a lower bound, you'd use max instead.

Here, I've just applied both operations, written together.

Edit: Here's a slightly more in-depth explanation:

Given an element x of the array (and assuming that your upper_bound is at least as big as your lower_bound!), you'll have one of three cases:

  1. x < lower_bound
  2. x > upper_bound
  3. lower_bound <= x <= upper_bound.

In case 1, the max/min expression first evaluates to max(lower_bound, x), which then resolves to lower_bound.

In case 2, the expression first becomes max(lower_bound, upper_bound), which then becomes upper_bound.

In case 3, we get max(lower_bound, x) which resolves to just x.

In all three cases, the output is what we want.