Several publications highlight that there may be biases in variable importance scores derived from machine learning models. A recent study shows by Loh and Zou (2021) shows that ranger
permutation-based variable importance scores produce unbiased results.
I am using tidymodels
with a ranger
engine to estimate random forest model. How can I get ranger
variable importance scores from the resulting fit? What is the difference between the variable importance scores from vip
? From my understanding, the vip in the example below is the random forest model-specific gini importance.
library(tidymodels)
library(vip)
aq <- na.omit(airquality)
model_rf <-
rand_forest(mode = "regression") %>%
set_engine("ranger", importance = "permutation") %>%
fit(Ozone ~ ., data = aq)
# variable importance
vip:::vi(model_rf)
I think you want to change the value of the
importance
argument to get the unbiased estimates.ranger
has a function to get the importance scores and the model-specific method in thevi
package:Created on 2022-11-16 by the reprex package (v2.0.1)