I am trying to access the labels (i.e. positional indicator) after binning my data by decile:
q = pd.qcut(df["revenue"], 10)
q.head():
7 (317.942, 500.424]
81 (317.942, 500.424]
83 (150.65, 317.942]
84 [0.19, 150.65]
85 (317.942, 500.424]
Name: revenue, dtype: category
Categories (10, object): [[0.19, 150.65] < (150.65, 317.942] < (317.942, 500.424] < (500.424, 734.916] ... (1268.306, 1648.35]
< (1648.35, 1968.758] < (1968.758, 2527.675] < (2527.675, 18690.2]]
In [233]:
This post link shows that you can do the following to access the labels:
>>> q.labels
But when I do that I get:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-246-e806c96b1ab2> in <module>()
----> 1 q.labels
C:\Users\blah\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
2666 if (name in self._internal_names_set or name in self._metadata or
2667 name in self._accessors):
-> 2668 return object.__getattribute__(self, name)
2669 else:
2670 if name in self._info_axis:
AttributeError: 'Series' object has no attribute 'labels'
In any case, what I want to do is use the labels to filter my data - likely by adding a new column in df which represents the positional label of the result of the decile (or quantile).
I personally like using the
labels
parameter inpd.qcut
to specify clean looking and consistent labels.As @jeremycg pointed out, you access category information via the
cat
accessor attributeYou can quickly describe each bin
You can filter
you can z-score within deciles