Via tidymodels and the vip package in R, I computed the variable importance. Code wise it would look like this:
rf_vi_fit %>%
pull_workflow_fit() %>%
vip(geom = "point") +
labs(title = "Random forest variable importance")
Visually it would look something like this:
However, what does the variable improtance actually entail? The variable importance can be based on multiple metrics, such as the gain in R-squared or the gini-loss, but I am unsure where the variable importance from the vip is based on. My other predictions has a variable importance of values around 3 to 4 instead of 0.005 as in this model.
I could not find what the variable importance is based on in the vip() documentation either.
The answer to you inquiry lies in various sections in the vip documentation https://cran.r-project.org/web/packages/vip/vip.pdf.
The
vip()
function is a wrapper aroundvi()
used to plot the variable importance scores. In thevip()
documentation, the...
argument is "Additional optional arguments to be passed on tovi()
".In the
vi()
function, there is an argument calledmethod
.Then, if you check the documentation of
vi_models()
, it describes in details the model-specific VI score for each type of model. Below is an excerpt describing RandomForest model specific importance.