Difference between the working process of `mclapply` and `foreach()` loop

Question

Difference between the working process of `mclapply` and `foreach()` loop

326 views Asked by ann At 04 November 2023 at 05:32

This is a general question out of curiosity. I am using the doParallel package for parallel computing. I use these packages for the simulation purposes.

What I observed is that when I was using the foreach loop for the simulation current usage memory in Rstudio rose drastically (4+GiB) and the Rstudio crashed sometime.

Now I am shifting to parallel::mclapply and have done the same simulation again but surprisingly there is no problem and current usage memory doesn't rise much (10+MiB).

I don't understand what is happening internally in the code. I expect a detailed explanation of the above processes.

sessionInfo() for my R is

R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: aarch64-apple-darwin20 (64-bit)

and OS is MacOS.

doParallel package version 1.0.17.

RStudio version 2023.03.01.

Example:

Suppose we are trying to count edges from a Erdos-Renyi graph. I am trying to simulate the graph each time and store the edge count value for each simulation.

Codes are the following

#ER random graph generator
src1 <- {"#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix ER_AdjMatGEN_cpp(int N, double p){
  NumericMatrix temp(N,N);
  for(int i=0; i< N; i++){
    for(int j=0; j < i; j++){
        temp(i,j) = R::rbinom(1,p);
        temp(j,i) = temp(i,j);
    }
  }
  return temp;
}"}

Niter <- 10000
#________________
edgeCnt_result1 <- foreach(icount(Niter),
                           v = iter(function() ER_AdjMatGEN_cpp(10000, 0.3)),
                           .combine = "rbind") %dopar% {
                             sum(v)
                           }
#___________
edgeCnt_result1 <- do.call(rbind, mclapply(1:Niter, function(i) {v = ER_AdjMatGEN_cpp(N = 10000, 0.3) 
return(sum(v))} , mc.cores = 7))

When I try to run the first iteration, Rstudio crashes but when I run the second iteration, it runs properly.

Original Q&A

There are 1 answers

**George Ostrouchov** · Answer 1 · 2023-11-15T05:18:01+00:00

Your mclapply() code first generates the complete Niter-long list and then the do.call() rbinds it all together. There is only one allocation of the results vector.

foreach() on the other hand uses .combine to rbind list elements as they become available, iteratively re-allocating the result vector as it grows.

Instead, use the .multicombine and .maxcombine parameters of foreach(). To mimic the do.call(), set the first TRUE and the second to Niter.

See https://rpubs.com/jimhester/rbind for a more detailed explanation.

TechQA.

Difference between the working process of `mclapply` and `foreach()` loop

Example:

There are 1 answers

Related Questions in R

Related Questions in PARALLEL-PROCESSING

Related Questions in PARALLEL.FOREACH

Related Questions in DOPARALLEL

Related Questions in MCLAPPLY

Popular Questions

Popular Tags

Trending Questions