R parallel computing with snowfall - writing to files from separate workers

915 views Asked by At

I am using the snowfall 1.84 package for parallel computing and would like each worker to write data to its own separate file during the computation. Is this possible ? if so how ?

I am using the "SOCK" type connection e.g., sfInit( parallel=TRUE, ...,type="SOCK" ) and would like the code to be platform independent (unix/windows).

I know it is possible to Use the "slaveOutfile" option in sfInit to define a file where to write the log files. But this is intended for debugging purposes and all slaves/workers must use the same file. I need each worker to have its OWN output file !!!

The data i need to write are large dataframes, and NOT simple diagnostic messages. These dataframes need be output by the slaves and could not be sent back to the master process. Anyone knows how i can get this done?

Thanks

1

There are 1 answers

0
Steve Weston On

A simple solution is to use sfClusterApply to execute a function that opens a different file on each of the workers, assigning the resulting file object to a global variable so you can write to it in subsequent parallel operations:

library(snowfall)
nworkers <- 3
sfInit(parallel=TRUE, cpus=nworkers, type='SOCK')
workerinit <- function(datfile) {
  fobj <<- file(datfile, 'w')
  NULL
}
sfClusterApply(sprintf('worker_%02d.dat', seq_len(nworkers)), workerinit)

work <- function(i) {
  write.csv(data.frame(x=1:3, i=i), file=fobj)
  i
}
sfLapply(1:10, work)
sfStop()