How do I calculate the standard deviation with a pivot table in Pandas?

6.6k views Asked by At

I have a bunch of data involving certain numbers for certain players of specific sports. I want to use pivot tables in Pandas to have it split up the data by sport, and for the corresponding value for each sport have the mean "number" value for all people who play that sport. (So if it were basketball, it would average the number of all the players who play basketball, and the number basically represents a preference.)

I can do this pretty easily with pivot tables, but if I wanted to do the same thing for calculating the standard deviation, I cannot figure out how. I can do np.mean for the mean, but there's no np.std. I know there's std() but I'm unsure how I'd use it in this context.

Are pivot tables not advisable for doing this task? How should I find the standard deviation for the numeric data of all players of a specific sport?

3

There are 3 answers

1
maxymoo On

What version of numpy are you using? 1.9.2 has np.std:

np.std?
Type:        function
String form: <function std at 0x0000000003EE47B8>
File:        c:\anaconda3\lib\site-packages\numpy\core\fromnumeric.py
Definition:  np.std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False)
Docstring:
Compute the standard deviation along the specified axis.

Returns the standard deviation, a measure of the spread of a distribution,
of the array elements. The standard deviation is computed for the
flattened array by default, otherwise over the specified axis.
0
Paul H On

If you have a DataFrame (df) with a column called "sport", it's as simple as:

df.groupby(by=['sport']).std()
0
Changoleon On
df.pivot_table(values='number', index='sport', aggfunc='std')