Calculating variance and standard deviation based on the Wikipedia description gives different results compared to the standard functions var() and sd() in R.
Variance: 4 versus 4.571429. Standard deviation: 2 versus 2.13809.
Anyone suggestions or an explanation?
> df <- c(2,4,4,4,5,5,7,9)
> df.length <- length(df)
> df.length
[1] 8
> df.mean <- sum(df) / df.length
> df.mean
[1] 5
> df.difference <- (df - df.mean)**2
> df.difference
[1] 9 1 1 1 0 0 4 16
> sum(df.difference)
[1] 32
> df.variance <- sum(df.difference) / df.length
> df.variance
[1] 4
> df.standard.deviation <- sqrt(df.variance)
> df.standard.deviation
[1] 2
> # mean, var and sd (default R)
> mean(df)
[1] 5
> var(df)
[1] 4.571429
> sd(df)
[1] 2.13809
It's the difference between dividing by
n
or(n-1)
degrees of freedom.It's
n-1
because ... copied straight from Wikipedia (link)