Even distribution of percentile labels on x axis

1.1k views Asked by At

Forgive my terminology, I'm not an expert at statistics or plotting!

Using Pandas, I am attempting to plot quantile data that is bucketed up to "5 9s". That is, for a given DataFrame 'df' that has a series 'foo' of unevenly distributed integer values:

q = df['foo'].quantile([.1, .2, .3, .4, .5, .6, .7, .8, .9, .99, .999, .9999, .99999, 1])
q.plot()

Results in a plot where the x-axis intervals between 0.9 and 1.0 are compressed: enter image description here

Is there a way to evenly space the quantile buckets on the x-axis ?

Thanks!

2

There are 2 answers

0
user1612443 On BEST ANSWER

Taking lmo's advice, here's the solution that works for me.

For a given dataframe 'df' that has a series 'A':

percentiles = [.1, .2, .3, .4, .5, .6, .7, .8, .9, .99, .999, .9999, .99999, 1.0]

pct = df['A'].quantile(percentiles)
xticks = range(0, len(percentiles), 1)
ax = pct.plot (xticks=xticks)
ax.set_xticklabels([str(p) for p in percentiles)
plt.show()

enter image description here

1
piRSquared On

I'd use pd.qcut

example

import pandas as pd
import numpy as np

a = np.sort(np.random.rand(1000))
b = a.repeat(np.arange(len(a)))
b += np.random.rand(len(b)) / 100
s = pd.Series(b)

s.hist()

enter image description here

you want this
use however many bins you want. I used 20. I also passed a labels parameter. Without it, pandas will label with the edges of the where the cuts were made.

q = pd.qcut(s, 20, labels=range(20))