I would like to split my series into exactly n groups (assuming there are at least n distinct values in the series), where the group sizes are approximately equal.
The code needs to be generic, so I cannot know the distribution of the data in advance, hence using pd.cut
with pre-defined bins is not an option for me.
I tried using pd.qcut
or pd.cut
with pd.Series.quantile
but they all fall short when some value is repeated very often in the series.
For instance, if I want exactly 3 groups:
series = pd.Series([1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 4, 4, 4, 4])
pd.qcut(series, q=3, duplicates="drop")
creates only 2 categories: Categories (2, interval[float64]): [(0.999, 3.0] < (3.0, 4.0]]
, whereas I would like to get something like [(0.999, 1.0] < (1.0, 3.0] < (3.0, 4.0]]
.
Is there any way to do this easily with pandas' built-in methods?