I have a pandas series series
. If I want to get the element-wise floor or ceiling, is there a built in method or do I have to write the function and use apply? I ask because the data is big so I appreciate efficiency. Also this question has not been asked with respect to the Pandas package.
Floor or ceiling of a pandas series in python?
143.9k views Asked by wolfsatthedoor AtThere are 7 answers
UPDATE: THIS ANSWER IS WRONG, DO NOT DO THIS
Explanation: using
Series.apply()
with a native vectorized Numpy function makes no sense in most cases as it will run the Numpy function in a Python loop, leading to much worse performance. You'd be much better off usingnp.floor(series)
directly, as suggested by several other answers.
You could do something like this using NumPy's floor, for instance, with a dataframe
:
floored_data = data.apply(np.floor)
Can't test it right now but an actual and working solution might not be far from it.
The pinned answer already the fastest. Here's I provide some alternative to do ceiling and floor using pure pandas and compare it with the numpy approach.
series = pd.Series(np.random.normal(100,20,1000000))
Floor
%timeit np.floor(series) # 1.65 ms ± 18.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit series.astype(int) # 2.2 ms ± 131 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit (series-0.5).round(0) # 3.1 ms ± 47 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit round(series-0.5,0) # 2.83 ms ± 60.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Why astype int works? Because in Python, when converting to integer, that it always get floored.
Ceil
%timeit np.ceil(series) # 1.67 ms ± 21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit (series+0.5).round(0) # 3.15 ms ± 46.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit round(series+0.5,0) # 2.99 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So yeah, just use the numpy function.
With pd.Series.clip
, you can set a floor via clip(lower=x)
or ceiling via clip(upper=x)
:
s = pd.Series([-1, 0, -5, 3])
print(s.clip(lower=0))
# 0 0
# 1 0
# 2 0
# 3 3
# dtype: int64
print(s.clip(upper=0))
# 0 -1
# 1 0
# 2 -5
# 3 0
# dtype: int64
pd.Series.clip
allows generalised functionality, e.g. applying and flooring a ceiling simultaneously, e.g. s.clip(-1, 1)
NOTE: Answer originally referred to clip_lower
/ clip_upper
which were removed in pandas 1.0.0.
The existing answers are limited. They either error on or incorrectly handle NaNs in the input_series.
You can correctly handle these cases with
# setup
input_series = pd.Series([pd.NA, pd.NA,3,4,5.4,pd.NA,5.3,7])
# floor all non-nans in input
mask_nan = input_series.isna()
input_series.where(mask_nan, np.floor(input_series[~mask_nan]))
# gives [<NA>, <NA>, 3, 4, 5, <NA>, 5, 7]
Important:
- use pandas
.isna()
to future-proof the ongoing pandas NA dtype changes - use pandas where not
np.where
so we can operate on a subset for the replacement series - np.floor() for speed
You can use NumPy's built in methods to do this:
np.ceil(series)
ornp.floor(series)
.Both return a Series object (not an array) so the index information is preserved.