I am having an issue with tidymodels
that I can't seem to figure out. Not sure if this is the intended behavior or an issue, but either way I would appreciate any help!
I am building a logistic regression prediction model with a two-level factor as the outcome, and per tidymodels
convention have set the "positive class" as the first level.
The base R stats::glm() assumes exactly the opposite: that the "positive class" is the second level, and the "reference" is the first level.
With that in mind, I anticipated that fitting a model with a tidymodels
workflow vs. stats::glm()
would result in estimated coefficients with similar magnitude and opposite directions. However, it seems that in reality, tidymodels
is behaving as stats::glm()
and treating the second level as the positive class.
library(tidymodels)
#build model to predict "manual" (am == 1)
#Positive class is first level of factors per tidymodels convention
df <-
mtcars %>%
as_tibble() %>%
mutate(am = factor(am, levels = c("1", "0")))
#tidymodels
recipe <- recipe(df) %>%
update_role("am", new_role = "outcome") %>%
update_role("mpg", new_role = "predictor")
glm_model <-
logistic_reg() %>%
set_engine("glm") %>%
set_mode("classification")
glm_wf <-
workflow() %>%
add_recipe(recipe) %>%
add_model(glm_model)
glm_fit <-
glm_wf %>%
fit(df)
glm_fit %>%
extract_fit_parsnip() %>%
tidy(exponentiate = T)
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 738. 2.35 2.81 0.00498
2 mpg 0.736 0.115 -2.67 0.00751
#base R
glm(am ~
mpg,
family = "binomial",
data = df) %>%
tidy(exponentiate = T)
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 738. 2.35 2.81 0.00498
2 mpg 0.736 0.115 -2.67 0.00751
#base R (treats second level as positive class) and tidymodels (treats first level as positive class) have the same output!
Any ideas? This is causing a lot of havoc when I try to report ORs and then use yardstick
for performance assessment (yardstick
assumes positive class is first). Thanks so much for the help, loving tidymodels
overall.