I'm trying to import some data in R but the imported data differs from the original one in the smaller digits in a way I'm not really able to understand why.
First I tried to import with read.csv the file "test 1.csv", which contains a single number:
1761,74956350536
But the imported number is not identical to the original one:
> x <- read.csv2("test 1.csv", header = FALSE)
> print(x, digits = 20)
V1
1 1761.7495635053599017
I first considered that R might be attempting to read the data into a kind of variable which would not accommodate this many digits. But in fact, the number read by R is "more precise" (has more digits) than the original one. And it was correctly read as a numeric type (thus double precision floating point).
I then tried to write x back to another .csv file:
> write.csv2(x, "test 1 out.csv", row.names = FALSE, col.names = FALSE)
With the result that the original number (1761,74956350536) was "restored".
I then thought it might be something on the way the print function works and the number read by read.csv was correct. I then created a spreadsheet in Excel (test 2.xlsx) with five numbers with the sum of these in the sixth column. Then exported it to a .csv file. The resulting file (test 2.csv) reads:
1761,749563505360000;2819,170384367300000;39,526631074886200;1600,629524205790000;0,556813868271186;6221,632917021610000
Notice the first number of the series is the same as the one contained in the original "test 1.csv" file. I then imported it into R and checked the sum.
> x <- read.csv2("test 2.csv", header = FALSE)
> conf <- (sum(x[1:5])-x[6])
> conf
V6
1 -9.094947e-13
> print(conf, digits = 20)
V6
1 -9.0949470177292823792e-13
And the first five numbers no longer add up to the sixth.
I tried other import methods, directly from the .xlsx file, using both xlsx and openxlsx packages:
> library(xlsx)
> x <- read.xlsx("test 2.xlsx", 1, header = FALSE)
> print(x[1], digits = 20)
X1
1 1761.7495635053587648
> conf <- (sum(x[1:5])-x[6])
> conf
X6
1 0
> print(conf, digits = 20)
X6
1 0
> write.xlsx(x, "test 2 out.xlsx", col.names=FALSE, row.names=FALSE)
And:
> library(openxlsx)
> x <- read.xlsx("test 2.xlsx", 1, colNames = FALSE)
> print(x[1], digits = 20)
X1
1 1761.7495635053587648
> conf <- (sum(x[1:5])-x[6])
> conf
X6
1 0
> print(conf, digits = 20)
X6
1 0
> write.xlsx(x, "test 2 out.xlsx", col.names=FALSE, row.names=FALSE)
Now we got still a third number, different from the original one and also different from the one read by read.csv. Once again write.xlsx "restores" the original numbers to "test 2 out.xlsx". In this case the checked sum matches, but unfortunately this is not the case for these packages and functions for a larger set of numbers with which I'm currently working on.
I feel I'm asking something which either obvious or well known (I'm a newbie...), for the operation is quite simple. But I'm indeed at a loss here. Does anyone know why this happens? Is there a way to avoid/correct/circumvent this behavior? Any help would be very appreciated.
(I'm running R version 3.2.0 (through RStudio) on a Linux Mint 17.1 Rebecca machine.)