I'm taking a first look at tidymodels. My alternative for the current project would be non-tidyfied ranger. On a test run, classification random forest with tidymodels using the ranger engine is much slower than hand-held ranger (approximately ten times slower) when run on the classic iris dataset. Why is that?
library(tidymodels)
library(ranger)
# Make example data
data("iris")
mydata <- iris[sample(1:nrow(iris), 600, replace=T),]
# Recipe
myrecipe <- mydata %>% recipe( Species ~ . )
# Setting a Ranger RF model
myRF <- rand_forest( trees = 300, mtry = 3, min_n = 1) %>%
set_mode("classification") %>%
set_engine("ranger")
# Setting a workflow
myworkflow <- workflow() %>%
add_model(myRF) %>%
add_recipe(myrecipe)
# Compare base ranger and tidy setup
time <- Sys.time()
fit_ranger <- ranger( Species ~ . , data = mydata, probability = T,
mtry = 3, num.trees = 300, min.node.size = 1)
ranger_time <- difftime( Sys.time(), time, "secs")
time <- Sys.time()
fit_tidy <- myworkflow %>%
fit(data= mydata)
tidy_time <- difftime( Sys.time(), time, "secs")
tidy_time
ranger_time