ranger variable importance scores from a tidymodel

404 views Asked by At

Several publications highlight that there may be biases in variable importance scores derived from machine learning models. A recent study shows by Loh and Zou (2021) shows that ranger permutation-based variable importance scores produce unbiased results.

I am using tidymodels with a ranger engine to estimate random forest model. How can I get ranger variable importance scores from the resulting fit? What is the difference between the variable importance scores from vip? From my understanding, the vip in the example below is the random forest model-specific gini importance.

library(tidymodels)
library(vip)

aq <- na.omit(airquality)

model_rf <-
  rand_forest(mode = "regression") %>%
  set_engine("ranger", importance = "permutation") %>%
  fit(Ozone ~ ., data = aq)

# variable importance
vip:::vi(model_rf)
1

There are 1 answers

0
topepo On BEST ANSWER

I think you want to change the value of the importance argument to get the unbiased estimates. ranger has a function to get the importance scores and the model-specific method in the vi package:

library(tidymodels)
library(vip)
#> 
#> Attaching package: 'vip'
#> The following object is masked from 'package:utils':
#> 
#>     vi

aq <- na.omit(airquality)

set.seed(1)
model_rf <-
  rand_forest(mode = "regression") %>%
  set_engine("ranger", importance = "impurity_corrected") %>%
  fit(Ozone ~ ., data = aq)

model_rf %>% 
  extract_fit_engine() %>% 
  ranger::importance() %>% 
  sort(decreasing = TRUE)
#>      Temp      Wind   Solar.R     Month       Day 
#> 27919.050 23028.379  6830.772  3077.430  1597.355

# the same as using ranger directly
vip:::vi(model_rf)
#> # A tibble: 5 × 2
#>   Variable Importance
#>   <chr>         <dbl>
#> 1 Temp         27919.
#> 2 Wind         23028.
#> 3 Solar.R       6831.
#> 4 Month         3077.
#> 5 Day           1597.

Created on 2022-11-16 by the reprex package (v2.0.1)