So I tried using the snowfall package for parallel execution in R, using all my cpu cores. This is the code I used for testing:
library(snow)
library(snowfall)
sfInit(parallel = TRUE, cpus = 16, type = "SOCK")
data <- array(1:1000000, dim=c(1000000,1))
system.time(x <- sfLapply(data, fun=function(x){return (x*x) }))
Which effectively runs 16 times faster as it uses all CPU cores available. But when I try this:
system.time(m2 <- J48(CHURNED_F~., data = data[, -c(1)]))
It takes about 50 seconds, as a test (with only about 1% of the whole data set) The following runs correctly but takes the same time and only uses one CPU:
library(snow)
library(snowfall)
sfInit(parallel = TRUE, cpus = 16, type = "SOCK")
system.time(m2 <- sfLapply("CHURNED_F~.", J48, data[, -c(1)]))
Am I just using the wrong syntax? How can I make this run in parallel?