What is the difference between statistics.stdev() & numpy.std() and which is more precise?

677 views Asked by At

I used this dataset:

lst = [81922.00557103065, 82887.70053475935, 80413.01627033792,
       81708.86075949368, 82997.38219895288, 84641.50943396226,
       81929.82456140351, 82632.24181360201, 77667.98418972333,
       73726.47427854454, 86113.2075471698, 83232.98429319372,
       79866.66666666667, 83833.74689826302, 81943.06930693069,
       77898.64029666255, 77401.47783251232, 80607.59493670886,
       78384.5126835781, 82608.69565217392, 80824.8730964467,
       84163.70106761566, 74887.38738738738
       ]

Then statistics.stdev(lst) is 3096.28 and numpy.std(lst) is 3028.23. The difference is about 2.2%.

1

There are 1 answers

0
Matt Hall On BEST ANSWER

They are calculating two slightly different things.

The standard deviation is the square root of the variance. NumPy is using the sample variance, whereas statistics is adjusting this with Bessel's correction. This uses N – 1 instead of N in the calculation of the variance:

arr = np.array(lst)
var_ordinary = np.sum(abs(arr - arr.mean())**2) / arr.size
var_bessel = np.sum(np.abs(arr - arr.mean())**2) / (arr.size - 1)

From the statistics docs:

This is the sample variance s² with Bessel’s correction, also known as variance with N-1 degrees of freedom. Provided that the data points are representative (e.g. independent and identically distributed), the result should be an unbiased estimate of the true population variance.