I'm trying to implement a nested for-loop using foreach and doParallel, but I don't want to loop over all combinations of values. Basically, I've got a square dataset and I want to run a function over each pair of values, but I don't need to duplicate--e.g., I need to calculate the function for [1,2], but not [2,1] since the result is the same. Here is a very basic example, though please note that I'm trying to use doParallel due to the complexity of the actual function/calculations.
bvec <- seq(1,10,1)
avec <- seq(1,10,1)
x <- data.frame()
for (i in 1:10) {
for (j in i:10) {
x[i,j] <- sim(avec[i], bvec[j])
}
}
x
The original dataset is about 1800 x 1800 which would result in over 3.2 million calculations if I did all pairwise calculations, which is unnecessary. Here is what I've got for the foreach:
cl <- parallel::makeCluster(detectCores()-4)
doParallel::registerDoParallel(cl)
clusterExport(cl, list("bvec","avec"))
z <-
foreach(i=1:10, .combine="cbind") %:%
foreach(j=i:10) %dopar% {
x[i,j] <- sim(avec[i], bvec[j])
}
z
parallel::stopCluster(cl)
Is it possible to limit the iterations using foreach? If not, is there any other way to optimize this process?
I've tried changing the foreach statement to
foreach(i=1:10, .combine="cbind") %:%
foreach(j=i:10) %dopar% {
x[i,j] <- sim(avec[i], bvec[j])
}
but that obviously doesn't work.
Edit - The below ideas benchmark slower than the simple loop. %do% is faster than %dopar%. Things get slow enough to tell at vec length 200. You'll want to benchmark basic parallel processes on your device to see if parallel is worth the overhead going forward.
...
I ran
microbenchmarkon a 1800x1800 data, and your nestedif()triangle loop is faster thanouter()at that number of calculations forsum().Here is a way to do
foreachnesting (lifted from the docs at https://cran.r-project.org/web/packages/foreach/vignettes/nested.html ) combined with anifelse()trick of evaluating the innerloop and skipping the heavy function for half the triangle.The
j=i:10idea and writing to a global object works with%do%, but not%dopar%, which is discussed in this thread https://stackoverflow.com/a/45920140/10276092 and says "[%dopar%] does not change the global object [x]"Below kind of works, but recycles the skipped values. Triangle shape isn't correct correct. Matrix magic from https://stackoverflow.com/a/48988950/10276092 to make data slightly presentable.