%dopar% is phenomenally slower when using iterators, compared to loading the data from text file. I've compared 3 use cases
- using %dopar% without any iterators
dopar1 - using %dopar% with iterators
dopar2 - using %dopar% when loading data from flat file
dopar3
now calculating dopar3 is 2 times faster (.5 seconds) than calculating dopar1 or dopar2 (about 1.2 seconds).
Both dopar1 and dopar2 have similar performance. I dont know if is converting the data frame of dopar1 into iterator internally.
I was expecting dopar2 to outperform dopar3. Is it because dopar2 is still loading the entire dataset for each thread ? If so then why isn't iterators not doing their job to prevent that from happening. Or have I used iterators wrongly ? Any help is much appreciated.
(My system has 8 physical cores, and so 4 cores were used for this parallel processing - as per the code)
library('foreach')
library("parallelly")
library("parallel")
library("doParallel")
library("data.table")
# Setting up and registering the cluster
cluster1 = makeCluster(ceiling(detectCores(logical=FALSE)/2), type="PSOCK", outfile="")
cluster1 = autoStopCluster(cluster1)
registerDoParallel(cluster1)
# Generating Data
data1 = as.data.frame(matrix(round(runif(100000000), 2), ncol=100))
# Storing Data to Flat File -- For Use Case - 3
data_loc = file.path("1.Basics", "ParallelDataset")
dir.create(data_loc, recursive=TRUE, showWarnings=FALSE)
for(var1 in 1:ncol(data1)) {
fwrite(data1[,var1, drop=FALSE], file.path(data_loc,paste(var1, ".csv", sep="")))
}
# Case-1 Parallel execution Without Iterator
system.time({
dopar1 = foreach(var1=data1) %dopar% {
sum(var1)
}
})
# Case-2 Parallel execution With Iterator
system.time({
dopar2 = foreach(var1=iter(data1, by="col")) %dopar% {
sum(var1)
}
})
# Case-3 Parallel execution - Loading data from flatfile
system.time({
dopar3 = foreach(file1=list.files(data_loc)) %dopar% {
sum(data.table::fread(file.path(data_loc, file1)))
}
})
stopCluster(cluster1)
One driver is what OS your system is running. Not sure if
parallellyis adding anything. And formakeCluster, I get faster results usingregisterDoParallelalone. Try this simplified approach in a clean R session