There are many ways to apply a function to each row.
Here some methods that I know:
method 1
for (i in 1:nrow(data) ) { my_function(data[i,]) }
method 2
apply(data,1,my_function)
method 3
library(plyr)
adply(data,.margins=1, .fun=my_function)
method 4
library(doParallel)
nodes <- detectCores()
cl <- makeCluster(nodes)
registerDoParallel(cl)
clusterEvalQ(cl,source("my_fun.R"))
adply(data,.margins=1, .parallel = T, .fun=my_function)
stopCluster(cl)
among the top 3 methods, I think the faster is the third one. But the question is: when method 4 (the parallel one) is faster than method 3? there is a way to understand it before to run all the code?