results changing between loop, manual entry in R

50 views Asked by At

I'm running into a weird issue of looped results being different from manually entered ones. I need to count the number of levels in a set of variables in a data-set. So I wrote a little code that turns the variable into a factor, counts the number of levels and sets this as numeric. This works fine. But when I loop it across all variables, it says each variable has 1 level. This happened on the dataset I'm using and the sample data I created below. It also happened when I wrote a function and used that in apply instead of a for loop. It must be something wrong with the loop, but I'm stuck. Any thoughts?

Here is the sample data, a dataframe with three variables (X,Y,Z), 18 observations.

X <- rep(c(1,2,3), 6)
Y <- rep(c(1,2), 9)
Z <- rep(c(1,2,3,4,5,6), 3)
XYZ_df <- as.data.frame(cbind(X,Y,Z))

So I count the number of levels for each variable-

levelsX <- as.numeric(nlevels(as.factor(XYZ_df$X)))
levelsY <- as.numeric(nlevels(as.factor(XYZ_df$Y)))
levelsZ <- as.numeric(nlevels(as.factor(XYZ_df$Z)))

The result is right- levelsX is 3, levelsY is 2 and levelsZ is 6.

But when I loop it, it changes. I created a vector with the values of the variables, then entered it into a for loop, pasting the XYZ_df$ prefix to the loop entry-

vars <- c("X", "Y", "Z")
outlist <- list()
for (a in vars) {
  levels <- as.numeric(nlevels(as.factor(paste("XYZ_df$",a, sep=""))))
  out <- c(a, levels)
  outlist <- c(outlist, list(out))
}

final <- do.call("rbind", outlist)

When I do this, the levels entry for each variable is 1.

As I said, this happened with two datasets, so I know it's my code. It happened with the apply as well. And it happened both the way I did it above (pasting the XYZ_df to the variable name) and when I used attach(XYZ_df) and just entered the variable name.

0

There are 0 answers