Use of fread() from data.table causes R session to abort

5k views Asked by At

I am working on a project for a MOOC, and was tinkering around with the data.table package in RStudio. Use of the fread() function to import the data files initially worked fine:

fread("UCI HAR Dataset/features.txt")->features
fread("UCI HAR Dataset/test/y_test.txt")->ytest

However, when I tried to run the following line of code, I received a pop-up that said "R Session Aborted: R encountered a fatal error. The session was terminated."

fread("UCI HAR Dataset/test/X_test.txt")->xtest

I don't understand what the problem is. I checked the file names and paths to make sure I had correctly spelled and capitalized everything, and it all checks out. The equivalent code using read.table() works fine and does not cause R to abort. I also tried renaming the file to "x_test.txt", but the same issue occurred.

According to ?fread, only the function will only work with "regular delimited files." As far as I can tell, the file is a "regular delimited file", in that all rows have the same number of columns. There are no cells containing "NA" when I use read.table instead; I checked using anyNA(). Is there a quick way to determine whether a file is a delimited "regularly" or not? Is there something else about the original file that could be causing the problem?


UPDATE

After further research and searching through the reported issues listed on the developer's github, I think that my problem lies in having two white spaces at the beginning of each row, which is discussed here. I am unsure why R aborted instead of giving me a warning. The latest development version of data.table (1.9.5) isn't causing the session to abort under the same conditions, though.

1

There are 1 answers

1
IRTFM On

Although I do believe you should have contacted the package maintainer first for any situation where the R session was aborted (and it was not due to your mucking with C-code), I can offer a strategy for your last request which is not really specific to fread but I've found useful with regular-reads(). I'm going to assume that this is a comma separated file but if it;'s whitespace separated you could change the sep="," to sep="".

filcnts <- count.fields("UCI HAR Dataset/test/X_test.txt", sep=",")
table(filcnts)

That should be a single items table. If not, try switching parameters such as quote, sep, blank.lines.skip, or comment.char