Difference between the working process of `mclapply` and `foreach()` loop

278 views Asked by At

This is a general question out of curiosity. I am using the doParallel package for parallel computing. I use these packages for the simulation purposes.

What I observed is that when I was using the foreach loop for the simulation current usage memory in Rstudio rose drastically (4+GiB) and the Rstudio crashed sometime.

Now I am shifting to parallel::mclapply and have done the same simulation again but surprisingly there is no problem and current usage memory doesn't rise much (10+MiB).

I don't understand what is happening internally in the code. I expect a detailed explanation of the above processes.

sessionInfo() for my R is

R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: aarch64-apple-darwin20 (64-bit)

and OS is MacOS.

doParallel package version 1.0.17.

RStudio version 2023.03.01.

Example:

Suppose we are trying to count edges from a Erdos-Renyi graph. I am trying to simulate the graph each time and store the edge count value for each simulation.

Codes are the following

#ER random graph generator
src1 <- {"#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix ER_AdjMatGEN_cpp(int N, double p){
  NumericMatrix temp(N,N);
  for(int i=0; i< N; i++){
    for(int j=0; j < i; j++){
        temp(i,j) = R::rbinom(1,p);
        temp(j,i) = temp(i,j);
    }
  }
  return temp;
}"}
Niter <- 10000
#________________
edgeCnt_result1 <- foreach(icount(Niter),
                           v = iter(function() ER_AdjMatGEN_cpp(10000, 0.3)),
                           .combine = "rbind") %dopar% {
                             sum(v)
                           }
#___________
edgeCnt_result1 <- do.call(rbind, mclapply(1:Niter, function(i) {v = ER_AdjMatGEN_cpp(N = 10000, 0.3) 
return(sum(v))} , mc.cores = 7))

When I try to run the first iteration, Rstudio crashes but when I run the second iteration, it runs properly.

1

There are 1 answers

0
George Ostrouchov On

Your mclapply() code first generates the complete Niter-long list and then the do.call() rbinds it all together. There is only one allocation of the results vector.

foreach() on the other hand uses .combine to rbind list elements as they become available, iteratively re-allocating the result vector as it grows.

Instead, use the .multicombine and .maxcombine parameters of foreach(). To mimic the do.call(), set the first TRUE and the second to Niter.

See https://rpubs.com/jimhester/rbind for a more detailed explanation.