How to bin a numerical pandas Series into n groups of approximately the same size without qcut?

229 views Asked by At

I would like to split my series into exactly n groups (assuming there are at least n distinct values in the series), where the group sizes are approximately equal.

The code needs to be generic, so I cannot know the distribution of the data in advance, hence using pd.cut with pre-defined bins is not an option for me.

I tried using pd.qcut or pd.cut with pd.Series.quantile but they all fall short when some value is repeated very often in the series.

For instance, if I want exactly 3 groups:

series = pd.Series([1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 4, 4, 4, 4])
pd.qcut(series, q=3, duplicates="drop")

creates only 2 categories: Categories (2, interval[float64]): [(0.999, 3.0] < (3.0, 4.0]], whereas I would like to get something like [(0.999, 1.0] < (1.0, 3.0] < (3.0, 4.0]].

Is there any way to do this easily with pandas' built-in methods?

0

There are 0 answers