I have a relatively large .dta
file, with 1280000 observations, that works fine in Stata, but I am having troubles importing it into R.
The data was created with Stata 15, the data contains strL or str#, #>244 variables and cannot be saved in the Stata 12 format.
I am trying to use the haven
package to import the saved data using read_dta()
but it is giving me the following error message: "Failed to parse /Users/folder/my_data.dta: Unable to allocate memory."
Does anyone know what might be causing this problem and how to overcome it to be able to import the data in R?
I have attempted to overcome the issue in multiple ways, but none of my attempts appears to work.
First I tried to expand the memory size of my r environment using
Sys.setenv('R_MAX_VSIZE'=32000000000)
but the console reports the same error when I try to import the data. The problem does not appear to relate to the size of my memory in R.I tried to save the data in Stata 13 format, using
saveold my_data13, version(13)
in Stata, but trying to import it into R withhaven
still produces the same error message.I tried to use readstata13 function
read.dta13(my_data13)
, but this regularly end up in R crashing.
The strange thing is that I am able to open the data correctly in Stata, by simply double-clicking on it.
Does anyone has any suggestion on how to address this issue? Any insight on a) the meaning of the error message and how to address it 2) alternative packages able to ope stata15 files c) approach to be able to open the data in R would be most welcome.
Thanks a lot in advance for your help
Best Regards
I just want to wrap up everything mentioned in the comments.
From the official Stata documentation, we have the following:
The above information makes me infer two points before I try to replicate your problem.
In your case, the use of
strL
was not necessary.The use of
strL
was the source of your problem, with a probable result some compatibility problem with thehaven
library.However, after trying to replicate what you described, I arrived at a different conclusion.
Please gently consider the following code that emulates your problem.
Afterward, I applied your modification to the data I generated.
Weirdly enough, I was able to import both files into R using the
haven
library with the following code.The only differences here could be:
Finally, I think the source of the problem wasn't the
strL
type of the data, but memory available on your machine, which was probably solved by yourcompress
step in the for loop you described.PS: Everything was run on Win10. R version 4.0.3 (2020-10-10) and
haven_2.3.1