Problems when importing factor variables from Stata using readstata13 package

479 views Asked by At

I have a very odd problem. I'm importing some factor variables from Stata into R using readstata13 package. The imported labels/levels look ok, but they change when removing factor class. Here is the Stata description of the variable (here is the data for reproducibility):

This is an image

Notice some labels are missing (UPDATE: actually, they are not missing. Rather, they are filled with a space, an odd way the coder used to highlight missing label). Notice also variable value 13 has 7 observations.

So I import the data in R and check levels and frequency. All fine:

Image here

Then I remove the levels using as.integer() (or as.numeric()), but things mess up. In particular values 11, 12 and 13. Notice now 11 has 7 observations, rather than 13:

Image here

The problem remains, regarding of read.dta13 options related to factors. I tried the second suggestion in this answer, using the following code, but did not work (most likely because only two values have labels):

labname <- get.label.name(data,"J_Itm1")
labtab <- get.label(data, labname)
table(get.origin.codes(data$J_Itm1, labtab))

Any idea how to solve the problem?

1

There are 1 answers

0
luchonacho On BEST ANSWER

It seems the problem is that the package readstata13 recreates factor values in R, without keeping the order of those in Stata.

The "solution" was to not import levels from Stata. This can be achieved using the convert.factors = FALSE option. Although not an optimal solution, it works for me because I do not need factor levels in the first place. I raised an issue in the package's website to see potential solutions.