I'm using foreach and reading up on it e.g.
- https://www.r-bloggers.com/the-wonders-of-foreach/
- https://www.rdocumentation.org/packages/foreach/versions/1.4.3/topics/foreach
My understanding is that you would use %dopar%
for parallel processing and %do%
for sequential.
As it happens I was having issues with %dopar%
and while trying to debug I changed it to a what I thought was a sequential loop using %do%
. I happened to have the terminal open and noticed all processors running while I ran the loop.
Is this expected?
Reproducible example:
library(tidyverse)
library(caret)
library(foreach)
# expected to see parallel here because caret and xgb with train()
xgbFit <- train(Species ~ ., data = iris, method = "xgbTree",
trControl = trainControl(method = "cv", classProbs = TRUE))
iris_big <- do.call(rbind, replicate(1000, iris, simplify = F))
nr <- nrow(iris_big)
n <- 1000 # loop over in chunks of 20
pieces <- split(iris_big, rep(1:ceiling(nr/n), each=n, length.out=nr))
lenp <- length(pieces)
# did not expect to see parallel processing take place when running the block below
predictions <- foreach(i = seq_len(lenp)) %do% {
# get prediction
preds <- pieces[[i]] %>%
mutate(xgb_prediction = predict(xgbFit, newdata = .))
return(preds)
}
bah <- do.call(rbind, predictions)
My best guess would be that these are processes still running from previous runs.
It is the same when using
foreach::registerDoSeq()
?My second guess would be that
predict
runs in parallel.