I am splitting my dataset by simulation ID and applying a runjags function simultaneously to each of these subdatasets. This allows me to take advantage of parallel processing and run my simulation on a cluster. A job that takes over 1 day can be completed in around 2 hours.
Out of the 1000 simulations I'm running, around 25 fail. I am using a for loop with a trap catch error to extract the coda files from the simulations that ran successfully. The problem is that once a simulation fails, the error affects all of the cores that I am using, and I can't extract the coda files for the simulations that ran successfully as well.
Any suggestions? I am including code and log files below. Thank you.
library(parallel)
library(coda)
#1) using mclapply to apply a function to all simulations simultaneously
output_models <- parallel::mclapply(subsetdata, function(x){
library(runjags)
set.seed(1)
model_data = x
runJagsOut <- run.jags(method = "simple",
model = "tempModel.txt",
monitor = c( "mu" ),
data = model_data,
#inits = initsList, # NOTE: Let JAGS initialize.
n.chains = 1, # NOTE: Not only 1 chain.
adapt = 500,
burnin = 3000,
sample = 2500,
thin = 1,
summarise = TRUE,
plots = FALSE)
return(runJagsOut)
}, mc.cores = numcores)
#2) Build an empty list vector
mcmc <- list()
#3) Extracting coda files for each of the simulations. tryCatch function in place to 'ignore' simulations that fail
for (SimulID in 1:length(unique(df$SimulID))) {
tryCatch({
mcmc[[SimulID]] <- cbind(output_models[[SimulID]][["mcmc"]][[1]],SimulID)
}, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
}
#4) Main text file with the coda for each simulation
lapply(mcmc, function(x) write.table( data.frame(x), 'output.txt' , append= T, sep=',', col.names = FALSE ))
This is what I see from the log file after the job completes on the cluster. The 1000th simulation followed by an error, followed by the subscript being out of bounds error for the for loop.
. Initializing model
. Adapting 500
-------------------------------------------------| 500
++++++++++++++++++++++++++++++++++++++++++++++++++ 100%
Adaptation successful
. Updating 3000
-------------------------------------------------| 3000
************************************************** 100%
. . Updating 2500
-------------------------------------------------| 2500
************************************************** 100%
. . . . Updating 0
. Deleting model
.
Simulation complete. Reading coda files...
Coda files loaded successfully
Calculating summary statistics...
Finished running the simulation
Warning message:
In parallel::mclapply(subsetdata, function(x) { :
scheduled cores 39, 24, 35, 21, 22, 23, 29, 3, 8, 19, 47, 34, 6, 48, 10, 33, 38, 41, 31, 18, 5, 16, 37 encountered errors in user code, all values of the jobs will be affected
ERROR : subscript out of bounds
ERROR : subscript out of bounds
ERROR : subscript out of bounds