I have 1000's of list and each list has multiple time series. I would like to apply forecasting to each element in the list. This has became an intractable problem interms of computing resources. I don't have backgrounder in parallel computing or advanced R programming. Any help would be greatly appreciated.
I have created dummy list. Basically, dat.list is similar to what I'm working on.
library("snow")
library("plyr")
library("forecast")
## Create Dummy Data
z <- ts(matrix(rnorm(30,10,10), 100, 3), start = c(1961, 1), frequency = 12)
lam <- 0.8
ap <- list(z=z,lam=lam)
## forecast using lapply
z <- ts(matrix(rnorm(30,10,10), 100, 3), start = c(1971, 1), frequency = 12)
lam <- 0.5
zp <- list(z=z,lam=lam)
dat.list <- list(ap=ap,zp=zp)
xa <- proc.time()
tt <- lapply(dat.list,function(x) lapply(x$z,function(y) (forecast::ets(y))))
xb <- proc.time()
The above code gives me what I need. I would like apply parrallel processing to both lapply in the code above. So I have attempted to use snow package and an example shown in this site.
## Parallel Processing
clus <- makeCluster(3)
custom.function <- function(x) lapply(x$z,function(y) (forecast::ets(y)))
clusterExport(clus,"custom.function")
x1 <- proc.time()
tm <- parLapply(clus,dat.list,custom.function)
x2<-proc.time()
stopCluster(clus)
Below are my questions,
- For some reason, the output of tm is differenct for the non parallel version. the forecast function ets is applied to every single data point as opposed to the element in the list.
Non parallel:
summary(tt)
Length Class Mode
ap 3 -none- list
zp 3 -none- list
Parallel Version:
summary(tm)
Length Class Mode
ap 300 -none- list
zp 300 -none- list
My second question is how should I parallelize the lapply in the custom function, basically a nested parLapply
custom.function <- function(x) parLapply(clus,x$z,function(y) (forecast::ets(y))) ## Not working
Many Thanks for your help
The problem is that the
forecast
package isn't loaded on the cluster workers which causeslapply
to iterate over thets
objects incorrectly. You can loadforecast
on the workers usingclusterEvalQ
:To answer your second question, your attempt at nested parallelism failed because the workers don't have
snow
loaded orclus
defined. But if you have 1000's of lists then you should have plenty of ways to keep all of your cores busy without worrying about nested parallelism. You're more likely to hurt your performance rather than help it, and it doesn't seem necessary.