Application of mclapply() to a function writing to a global variable

1.3k views Asked by At

I'm trying to use parallel::mclapply to speed up the calculation of the following code:

library(raster)  
library(HistogramTools)#for AddHistogram
#Create a first h here for the first band... omitted for brevity
readNhist <- function(n,mconst) {
  l <- raster(filename[i], varname=var[i], band=n, na.rm=T)
  gain(l) <- mconst
  h <<- AddHistograms(h, hist(l, plot=F, breaks=histbreaks,right=FALSE))
}
lapply(   1:10000, readNhist, mconst=1, mc.cores=7  )
#Then do stuff with the h histogram...

When performing the code above, all is fine. If using mclapply (below), the result is miles away from what I want to obtain: the histograms are all wrong.

library(raster)  
library(HistogramTools)#for AddHistogram
library(parallel)
#Create a first h here for the first band... omitted for brevity
readNhist <- function(n,mconst) {
  l <- raster(filename[i], varname=var[i], band=n, na.rm=T)
  gain(l) <- mconst
  h <<- AddHistograms(h, hist(l, plot=F, breaks=histbreaks,right=FALSE))
}
mclapply(   2:10000, readNhist, mconst=1  )
#Then do stuff with the h histogram...

I feel like there's something vital I'm missing with the application of parallel computation to this function.

1

There are 1 answers

0
AF7 On BEST ANSWER

The problem is the <<- which is bad practice in general as far as I can gather.

The function can be rearranged thusly:

readNhist <- function(n,mconst) {
  l <- raster(filename, varname=var, band=n, na.rm=T)
  gain(l) <- mconst
  hist <- hist(l, plot=F, breaks=histbreaks,right=FALSE)
  return(hist)
}

And called like this:

hists <- mclapply(   2:nbands, readNhist, mconst=gain, mc.cores=ncores  )
ch <- AddHistograms(x=hists)
h <- AddHistograms(h, ch)
rm(ch, hists)

This is pretty fast even with a huge number of layers (and thus histograms).