R with FF crashes when loading a large dataset

477 views Asked by At

Good evening,

I am attempting to load a dataset into R (~20 mil rows, 140 cols ~6.2gb on disk) using either LaF and ffbase or ff. In either case the load fails.

struct <- detect_dm_csv(file = '/scratch/proj.csv', header = TRUE)
colClasses <- struct$columns[,2]
ldat <- laf_open(struct)
data <- laf_to_ffdf(ldat)

or data <- read.csv.ffdf(file = 'proj.csv', colClasses =colClasses, header = TRUE)

It chugs along for a bit and then outputs a massive amount of items such as: 1L 1L 1L which seem to correspond to variables.

And then lists the variables like : variable_name = list() then 5: ffdfappend(x,block) 6: laf_to_ffdf(ldat)

and finally asks how I'd like to exit R.

I've tried sinking the output but it's not writing anything since the sink does not get closed (?), and the amount of nonsense it's outputting seems to break my scroll buffer.

Has anyone experienced this before?

More Info: I ran the same script in a Windows 7 virtual machine and it completed fine. By luck I was able to see the error that precedes all the nonsense and it states something about a "nonexistent physical address" which would be mmap related it would seem.

I'm going to try and recompile everything and see how it goes. Any further suggestions please let me know!

1

There are 1 answers

0
Nikos On

Have you tried data.table's fread?

Can you test:

library(data.table)
data <- fread(file = '/scratch/proj.csv', verbose=TRUE)

I have files that are of similar size and using fread everything runs smoothly.