This is a general question out of curiosity. I am using the doParallel package for parallel computing. I use these packages for the simulation purposes.
What I observed is that when I was using the foreach loop for the simulation current usage memory in Rstudio rose drastically (4+GiB) and the Rstudio crashed sometime.
Now I am shifting to parallel::mclapply and have done the same simulation again but surprisingly there is no problem and current usage memory doesn't rise much (10+MiB).
I don't understand what is happening internally in the code. I expect a detailed explanation of the above processes.
sessionInfo() for my R is
R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: aarch64-apple-darwin20 (64-bit)
and OS is MacOS.
doParallel package version 1.0.17.
RStudio version 2023.03.01.
Example:
Suppose we are trying to count edges from a Erdos-Renyi graph. I am trying to simulate the graph each time and store the edge count value for each simulation.
Codes are the following
#ER random graph generator
src1 <- {"#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix ER_AdjMatGEN_cpp(int N, double p){
NumericMatrix temp(N,N);
for(int i=0; i< N; i++){
for(int j=0; j < i; j++){
temp(i,j) = R::rbinom(1,p);
temp(j,i) = temp(i,j);
}
}
return temp;
}"}
Niter <- 10000
#________________
edgeCnt_result1 <- foreach(icount(Niter),
v = iter(function() ER_AdjMatGEN_cpp(10000, 0.3)),
.combine = "rbind") %dopar% {
sum(v)
}
#___________
edgeCnt_result1 <- do.call(rbind, mclapply(1:Niter, function(i) {v = ER_AdjMatGEN_cpp(N = 10000, 0.3)
return(sum(v))} , mc.cores = 7))
When I try to run the first iteration, Rstudio crashes but when I run the second iteration, it runs properly.
Your
mclapply()code first generates the completeNiter-long list and then thedo.call()rbinds it all together. There is only one allocation of the results vector.foreach()on the other hand uses.combineto rbind list elements as they become available, iteratively re-allocating the result vector as it grows.Instead, use the
.multicombineand.maxcombineparameters offoreach(). To mimic thedo.call(), set the firstTRUEand the second toNiter.See https://rpubs.com/jimhester/rbind for a more detailed explanation.