I'm conducting a factor analysis of several variables in R using factanal()
(but am open to using other packages). I want to determine each case's factor score, but I want the factor scores to be unstandardized and on the original metric of the input variables. When I run the factor analysis and obtain the factor scores, they are standardized with a normal distribution of mean=0, SD=1, and are not on the original metric of the input variables. How can I obtain unstandardized factor scores that have the same metric as the input variables? Ideally, this would mean a similar mean, sd, range, and distribution.
I asked a similar question previously, but the respondent's answer involved rescaling standardized (i.e., normally distributed) factor scores. Note that I don't want to transform standardized factor scores to unstandardized ones because the distributions of my indicators are non-normal (i.e., the normal distribution of standardized factor scores cannot be easily transformed to the raw metric of my indicators). In other words, I'd like to estimate unstandardized factor scores on the raw metric of the indicators without first estimating them on a standardized metric.
Also, there are some missing data. How can I obtain (unstandardized) factor scores for all cases, even those who don't have data on all items?
Here's a small example:
library(psych)
v1 <- c(1,1,1,NA,1,1,1,1,1,1,3,3,3,3,3,4,5,6)
v2 <- c(1,2,1,1,1,1,2,1,2,1,3,4,3,3,3,4,6,5)
v3 <- c(3,3,3,3,3,1,1,1,1,1,1,1,1,1,1,5,4,6)
v4 <- c(3,3,4,3,3,1,1,2,NA,1,1,1,2,1,1,5,6,4)
v5 <- c(1,1,1,1,1,3,3,3,3,3,1,1,1,1,1,6,4,5)
v6 <- c(1,1,1,2,1,3,3,3,4,3,1,1,NA,2,1,6,5,4)
m1 <- cbind(v1,v2,v3,v4,v5,v6)
m1FactorScores <- factanal(~v1+v2+v3+v4+v5+v6, factors = 1, scores = "Bartlett", na.action="na.exclude")$scores
describe(m1) #means~2.3, sds~1.5
describe(m1FactorScores) #mean=0, sd=1
The data above are just a small example. My actual data are not likert/ordinal data. They are forecasts of football players' passing yards from various sources. My hope is that a "latent average" would more accurately forecast players' passing yards than an average because it would discard the unique biases of each source. The data are highly positively skewed, however, and forcing the latent variable and its factor scores to be normally distributed results in implausibly high values for many players (e.g., over 6,000 yards passing next season).