I'm trying to parallelize this example.
I have a bunch of rasters that I am trying to aggregate by week of the year. Here is what this looks like in series:
# create a raster stack from list of GeoTiffs
tifs <- list.files(path = "./inputData/", pattern = "\\.tif$", full.names = TRUE)
r <- stack(tifs)
# get the date from the names of the layers and extract the week
indices <- format(as.Date(names(r), format = "X%Y.%m.%d"), format = "%U")
indices <- as.numeric(indices)
# calculate weekly means
r_week <- stackApply(r, indices, function(x) mean(x, na.rm = TRUE))
This is my attempt at parallelization using snow
and pbapply
.
# aggregate rasters in parallel
no_cores <- parallel::detectCores() - 1
tryCatch({
cl <- snow::makeCluster(no_cores, "SOCK")
snow::clusterEvalQ(cl, {
require(pacman)
p_load(dplyr
,rts
,raster
,stringr
,pbapply
,parallel)
})
parallel::clusterExport(cl = cl, varlist = list("r", "indices"))
r_week <- pbapply::pbsapply(r, indices, stackApply(r, indices, function(x) mean(x, na.rm = TRUE)), simplify = TRUE, USE.NAMES = TRUE, cl = cl)
snow::stopCluster(cl)
}, error=function(e){
snow::stopCluster(cl)
return(e)
}, finally = {
try(snow::stopCluster(cl), silent = T)
})
The stackApply()
method does not take a cluster argument, so I'm trying to wrap it in a pbsapply()
. This returns the following error:
<simpleError in get(as.character(FUN), mode = "function", envir = envir): object 'indices' of mode 'function' was not found>
I think I found a workaround using the
raster::clusterR()
method. It doesn't provide a progress bar though. It would be great to see if someone knows how to do this withsnow
andpbapply
.