I have a very odd problem. I'm importing some factor variables from Stata into R using readstata13
package. The imported labels/levels look ok, but they change when removing factor class. Here is the Stata description of the variable (here is the data for reproducibility):
Notice some labels are missing (UPDATE: actually, they are not missing. Rather, they are filled with a space, an odd way the coder used to highlight missing label). Notice also variable value 13 has 7 observations.
So I import the data in R and check levels and frequency. All fine:
Then I remove the levels using as.integer()
(or as.numeric()
), but things mess up. In particular values 11, 12 and 13. Notice now 11 has 7 observations, rather than 13:
The problem remains, regarding of read.dta13
options related to factors. I tried the second suggestion in this answer, using the following code, but did not work (most likely because only two values have labels):
labname <- get.label.name(data,"J_Itm1")
labtab <- get.label(data, labname)
table(get.origin.codes(data$J_Itm1, labtab))
Any idea how to solve the problem?
It seems the problem is that the package
readstata13
recreates factor values in R, without keeping the order of those in Stata.The "solution" was to not import levels from Stata. This can be achieved using the
convert.factors = FALSE
option. Although not an optimal solution, it works for me because I do not need factor levels in the first place. I raised an issue in the package's website to see potential solutions.