Using fread with foreach and doParallel in R

3.1k views Asked by At

I used fread with foreach and doParallel package in R 3.2.0 in ubuntu 14.04. The following code works just fine, even though I didn't use registerDoParallel.

library(foreach)
library(doParallel)
library(data.table)

write.csv(iris,'test.csv',row.names=F)

cl<-makeCluster(4)

tmp<-foreach(i=1:10) %dopar% { t <- fread('test.csv') }

tmp<-rbindlist(tmp)

stopCluster(cl)

However, when switching to Windows 7 it no longer works, with or without 'registerDoParallel'.

library(foreach)
library(doParallel)
#library(doSNOW)
library(data.table)

write.csv(iris,'test.csv',row.names=F)

cl<-makeCluster(4) 
registerDoParallel(cl)
#registerDoSNOW(cl)

tmp<-foreach(i=1:10) %dopar% { t <- fread('test.csv') }

tmp<-rbindlist(tmp)

stopCluster(cl)

'doSNOW' package doesn't work either. Below is the error message.

Error in { : task 1 failed - "could not find function "fread""

Does anyone have any similar experience?


A follow up question is regarding nested foreach. It seems the following won't work.

cl<-makeCluster(4)
registerDoParallel(cl)
clusterEvalQ(cl , library(data.table))

tmp<-foreach(j=1:10) %dopar% {

            tmp1<-foreach(i=1:10) %dopar% {
                          t<-fread('test.csv',data.table=T)
                   }  
            rbindlist(tmp1)
      }
stopCluster(cl)

   

1

There are 1 answers

1
Lamothy On BEST ANSWER

Thanks to user20650 for the reference in here. Basically it can be solved by setting .export='fread' in the foreach function.

More precisely, the following will fix the problem.

 tmp<-foreach(i=1:10,.export = 'fread') %dopar% { 
              t <- fread('test.csv',data.table=T) 
      }

To my follow up question regarding nested foreach, user20650 answered it in his comments. Namely,adding clusterEvalQ(cl , c(library(data.table),library(foreach))). The following code seems to work both in ubuntu and windows.

cl<-makeCluster(4)
registerDoParallel(cl)
clusterEvalQ(cl , c(library(data.table),library(foreach)))

tmp<-foreach(j=1:10) %dopar% {

     tmp1<-foreach(i=1:10) %dopar% { t <- fread('test.csv',data.table=T) }
     rbindlist(tmp1)
     }