Why is PCA analysis in R using order as a variable?

109 views Asked by At

I am doing PCA analysis in R. I am not by any means a programmer so please have some patience me if I'm too vague or use incorrect terminology :)

So, for context, I am doing PCA of a giant dataset of US counties, with a ton of demographic data!

pcatest <- prcomp(countydata, center = TRUE, scale = TRUE)

Beforehand, this prcomp function was not accepting my countydata dataframe, saying it was "not numeric," so I needed to unlist it, use the as.numeric function, create a matrix and turn it back into a dataframe.

Anyways, after doing this, I noticed that the PCA analysis was definitely a bit weird. For most counties in the US, PC1 was around -0.9, but in nearly every county in Iowa, as well as some in Illinois and Indiana, values ranged from 20-40. Counties in Alabama, Alaska, and Arizona also had significantly lower than average values, despite being highly demographically different. I meticulously checked my data, nothing seemed off about the information that would lead to this PCA failure? I checked to see if numerical order or row number was accidentally made a variable analyzed by PCA, and it didn't seem like it!

Now, I do not know what to do. Maybe this error has something to do with what I had to do in order to use the prcomp function, maybe not. Has anyone else had this issue? If so, I would really like help. Thank you! :)

0

There are 0 answers