Checking calculation of covariance matrix in R

39 views Asked by At

I came across this formula in a text that says $S$ is the sample covariance matrix where

$$S = \sum_{j=1}^n(\mathbf{X}_j - \bar{\mathbf{X}})(\mathbf{X}_j-\bar{\mathbf{X}})'$$, or from the source:

enter image description here

What I am trying to figutre out is how to calculate that equation in R. For example, if I had the following:

x <- c(1, 3, 5, 2)
y <- c(2, 3, 8, 7)
z <- c(22, 1, 3, 3)

X <- cbind(x, y, z)

I assume I can just use the cov() function and get

> cov(X)
           x          y         z
x   2.916667   3.333333 -11.25000
y   3.333333   8.666667 -17.66667
z -11.250000 -17.666667  97.58333

I also saw this calculation based on the above formula:

xbar <- apply(X, 2, mean)
d <- as.matrix(t(t(X) - xbar))

s2 <- matrix(0, 3, 3)
for (i in 1:3) {
  s2 <- s2 + (d[i, ]) %*% t(d[i, ])
}
> s2
            x     y        z
[1,]   8.1875  11.5 -36.9375
[2,]  11.5000  22.0 -44.5000
[3,] -36.9375 -44.5 274.6875

but as you can see, the two do not return the same sample covariance matrix. I am having a hard time figuring out which is the correct way to calculate that equation, or if neither is correct.

0

There are 0 answers