Parallel processing in R with "parallel" package - unpredictable runtime

172 views Asked by At

I've been learning to parallelize code in R using the parallel package, and specifically, the mclapply() function with 14 cores.

Something I noticed, just from a few runs of code, is that repeat calls of mclapply() (with the same arguments and same number of cores used) take significantly different lengths of time. For example, the first run took 18s, the next run took 23s, and the next one took 34s when I did them back to back to back (on the same input). So I waited a minute, ran the code again, and it was back down to taking 18s.

Is there some equivalent of "the computer needs a second to cool down" after running the code, which would mean that running separate calls of mclapply() back to back might take longer and longer amounts of time, but waiting for a minute or so and then running mclapply() again gets it back to normal?

I don't have much experience with parallelizing in R, but this is the only ad-hoc explanation I can think of. It would be very helpful to know if my reasoning checks out, and hear in more detail about why this might be happening. Thanks!

To clarify, my calls are like:

RNGkind("L'Ecuyer-CMRG")
set.seed(1)    
x <- mclapply(training_data, simulation, testing_data, mc.cores=14, mc.set.seed = TRUE)

Running this twice in a row takes a lot longer the second time for me. Waiting for a minute and then running it again, it becomes fast again.

1

There are 1 answers

1
shpartan On

I haven't used mcapply but I have used parallel, foreach and pbapply packages. I think the inconsistency lies in the fact that there are small overheads involved in firing workers and in communicating on progress of running tasks in parallel.