R: read.fwf defines integer as numeric

350 views Asked by At

I have a .txt file and am using Rstudio.

200416657210340 1665721 20040608 20090930 20060910 20070910 20080827 20090804
200416657210345 1665721 20040907 20090203 20070331 20080719                  
200416657210347 1665721 20040914 20091026 20070213 20080114 20090302         
200416657210352 1665721 20041111 20100315 20070123 20071205          20081202

I am trying to read in the .txt file using read.fwf :

gripalisti <- read.fwf(file = "gripalisti.txt",
                         widths = c(15,8,9,9,9,9,9,9),
                         header = FALSE,
                         #stringsAsFactors = FALSE, 
                       col.names = c("einst","bu","faeding","forgun","burdur1",
                                     "burdur2","burdur3","burdur4"))

This works and the columns are the correct lenght. However the "einst" and "bu" are supposed to be integer values and the rest are supposed to be dates.

When imported all the values in the first column (ID variables) look like this:

2.003140e+14

I have been trying to search for a way to change the imported column to integer (or character?) values and I have not found anything that does not result in an error. An example, that I tried after a google:

gripalisti <- read.fwf(file = "gripalisti.txt",
                         widths = c(15,8,9,9,9,9,9,9),
                         header = FALSE,
                         #stringsAsFactors = FALSE, 
                       col.names = c("einst","bu","faeding","forgun","burdur1",
                                     "burdur2","burdur3","burdur4"),
                       colclasses = c("integer", "integer", "Date", "Date",
                                      "Date", "Date", "Date", "Date"))

results in the error:

Error in read.table(file = FILE, header = header, sep = sep, row.names = row.names,  : 
  unused argument (colclasses = c("integer", "integer", "Date", "Date", "Date", "Date", "Date", "Date"))

There are many missing values in the dataset that is over 100.000 lines. So other ways of importing have not worked for me. The dataset is NOT tab delimited.

Sorry if this is obvious, I am a very new R user.

edit:

Thanks for the help, I changed it to:

 colClasses = c("character", 

And now it look good.

2

There are 2 answers

0
r2evans On

As suggested in the comments:

  1. it is colClasses=, not colclasses=, typo;
  2. that first field cannot be stored as "integer", it must either be "numeric" or "character";
  3. (additionally) those dates are not in the default format of %Y-%m-%d, you will need to convert them after reading in the data.

Prep:

writeLines("200416657210340 1665721 20040608 20090930 20060910 20070910 20080827 20090804\n200416657210345 1665721 20040907 20090203 20070331 20080719                  \n200416657210347 1665721 20040914 20091026 20070213 20080114 20090302         \n200416657210352 1665721 20041111 20100315 20070123 20071205          20081202",
           con = "gripalisti.txt")

Execution:

dat <- read.fwf("gripalisti.txt", widths = c(15,8,9,9,9,9,9,9), header = FALSE,
                col.names = c("einst","bu","faeding","forgun","burdur1", "burdur2","burdur3","burdur4"),
                colClasses = c("character", "integer", "character", "character", "character", "character", "character", "character"))
str(dat)
# 'data.frame': 4 obs. of  8 variables:
#  $ einst  : chr  "200416657210340" "200416657210345" "200416657210347" "200416657210352"
#  $ bu     : int  1665721 1665721 1665721 1665721
#  $ faeding: chr  " 20040608" " 20040907" " 20040914" " 20041111"
#  $ forgun : chr  " 20090930" " 20090203" " 20091026" " 20100315"
#  $ burdur1: chr  " 20060910" " 20070331" " 20070213" " 20070123"
#  $ burdur2: chr  " 20070910" " 20080719" " 20080114" " 20071205"
#  $ burdur3: chr  " 20080827" "         " " 20090302" "         "
#  $ burdur4: chr  " 20090804" "         " "         " " 20081202"

dat[,3:8] <- lapply(dat[,3:8], as.Date, format = "%Y%m%d")
dat
#             einst      bu    faeding     forgun    burdur1    burdur2    burdur3    burdur4
# 1 200416657210340 1665721 2004-06-08 2009-09-30 2006-09-10 2007-09-10 2008-08-27 2009-08-04
# 2 200416657210345 1665721 2004-09-07 2009-02-03 2007-03-31 2008-07-19       <NA>       <NA>
# 3 200416657210347 1665721 2004-09-14 2009-10-26 2007-02-13 2008-01-14 2009-03-02       <NA>
# 4 200416657210352 1665721 2004-11-11 2010-03-15 2007-01-23 2007-12-05       <NA> 2008-12-02

str(dat)
# 'data.frame': 4 obs. of  8 variables:
#  $ einst  : chr  "200416657210340" "200416657210345" "200416657210347" "200416657210352"
#  $ bu     : int  1665721 1665721 1665721 1665721
#  $ faeding: Date, format: "2004-06-08" "2004-09-07" "2004-09-14" "2004-11-11"
#  $ forgun : Date, format: "2009-09-30" "2009-02-03" "2009-10-26" "2010-03-15"
#  $ burdur1: Date, format: "2006-09-10" "2007-03-31" "2007-02-13" "2007-01-23"
#  $ burdur2: Date, format: "2007-09-10" "2008-07-19" "2008-01-14" "2007-12-05"
#  $ burdur3: Date, format: "2008-08-27" NA "2009-03-02" NA
#  $ burdur4: Date, format: "2009-08-04" NA NA "2008-12-02"
5
Anup Tirpude On

here the number in the first column is very large number, if you import it in term of integer or numeric it will automatically shown in exponent format. The way to resolve this to set scipen before reading the file. use below code :

options(scipen = 999)

enter image description here

I think this should resolve your problem.

Below is code I run, of course for date columns you need to to work. For that you can use simple command like as.Date(gripalisti$burdur1, format = "%Y%m%d")

enter image description here