I'm running into a weird issue of looped results being different from manually entered ones. I need to count the number of levels in a set of variables in a data-set. So I wrote a little code that turns the variable into a factor, counts the number of levels and sets this as numeric. This works fine. But when I loop it across all variables, it says each variable has 1 level. This happened on the dataset I'm using and the sample data I created below. It also happened when I wrote a function and used that in apply instead of a for loop. It must be something wrong with the loop, but I'm stuck. Any thoughts?
Here is the sample data, a dataframe with three variables (X,Y,Z), 18 observations.
X <- rep(c(1,2,3), 6)
Y <- rep(c(1,2), 9)
Z <- rep(c(1,2,3,4,5,6), 3)
XYZ_df <- as.data.frame(cbind(X,Y,Z))
So I count the number of levels for each variable-
levelsX <- as.numeric(nlevels(as.factor(XYZ_df$X)))
levelsY <- as.numeric(nlevels(as.factor(XYZ_df$Y)))
levelsZ <- as.numeric(nlevels(as.factor(XYZ_df$Z)))
The result is right- levelsX is 3, levelsY is 2 and levelsZ is 6.
But when I loop it, it changes. I created a vector with the values of the variables, then entered it into a for loop, pasting the XYZ_df$ prefix to the loop entry-
vars <- c("X", "Y", "Z")
outlist <- list()
for (a in vars) {
levels <- as.numeric(nlevels(as.factor(paste("XYZ_df$",a, sep=""))))
out <- c(a, levels)
outlist <- c(outlist, list(out))
}
final <- do.call("rbind", outlist)
When I do this, the levels entry for each variable is 1.
As I said, this happened with two datasets, so I know it's my code. It happened with the apply as well. And it happened both the way I did it above (pasting the XYZ_df to the variable name) and when I used attach(XYZ_df) and just entered the variable name.