I need to read data from text files (many of them and very large), that typically look as follows:
#2013#3090050010#CCOU#01#022#1#N#16/03/2015 #2013#3090050010#CCOU#01#023#1##16/03/2015 #2013#3090050010#CCOU#02#005#1#1692528#16/03/2015 #2013#3090430110#CCOU#15#504#2#blablablablablablablablablablablablablab labla#01/10/2014
#2013#3090430110#CCOU#15#505#2##01/10/2014
So "#" is the separator and sometimes long records use two lines. I have a workaround whereby I ignore lines that do not begin with "#", using grep:
x<-readLines("data.txt")
y <- grep("^#",x)
app<-x[y]
NamesForCols<-c("..",...)
myDat<-read.table(text=app,header =F,sep="#",quote="",col.names = NamesForCols, colClasses=c("NULL", "factor", NA,NA,NA,NA,NA,"character","NULL"), fill=T,blank.lines.skip=T,comment.char = "",allowEscapes = T)
But I am not happy with this solution (there is loss of significant data). Is there a way to read the data.txt file so that each record necessarily entails meeting the "#" symbol exactly 8 times even though this implies sometimes visiting two lines? Any other suggestion would be welcome. Thank you!
You could do the following:
Now combine columns that dont start by
#
with the preceeding one:Now you can read that e.g. via:
Which results in: