Using modelr::add_predictions for glm

2.8k views Asked by At

I am trying to calculate the logistic regression prediction for a set of data using the tidyverse and modelr packages. Clearly I am doing something wrong in the add_predictions as I am not receiving the "response" of the logistic function as I would if I were using the 'predict' function in stats. This should be simple, but I can't figure it out and multiple searches yielded little.

library(tidyverse)
library(modelr)
options(na.action = na.warn)
library(ISLR)

d <- as_tibble(ISLR::Default)
model <- glm(default ~ balance, data = d, family = binomial)
grid <- d %>% data_grid(balance) %>% add_predictions(model)

ggplot(d, aes(x=balance)) + 
    geom_point(aes(y = default)) + 
    geom_line(data = grid, aes(y = pred))
1

There are 1 answers

1
alistaire On BEST ANSWER

predict.glm's type parameter defaults to "link", which add_predictions does not change by default, nor provide you with any way to change to the almost-certainly desired "response". (A GitHub issue exists; add your nice reprex on it if you like.) That said, it's not hard to just use predict directly within the tidyverse via dplyr::mutate.

Also note that ggplot is coercing default (a factor) to numeric in order to plot the line, which is fine, except that "No" and "Yes" are replaced by 1 and 2, while the probabilities returned by predict will be between 0 and 1. Explicitly coercing to numeric and subtracting one fixes the plot, though an extra scale_y_continuous call is required to fix the labels.

library(tidyverse)
library(modelr)

d <- as_tibble(ISLR::Default)
model <- glm(default ~ balance, data = d, family = binomial)

grid <- d %>% data_grid(balance) %>% 
    mutate(pred = predict(model, newdata = ., type = 'response'))

ggplot(d, aes(x = balance)) + 
    geom_point(aes(y = as.numeric(default) - 1)) + 
    geom_line(data = grid, aes(y = pred)) + 
    scale_y_continuous('default', breaks = 0:1, labels = levels(d$default))

Also note that if all you want is a plot, geom_smooth can calculate predictions directly for you:

ggplot(d, aes(balance, as.numeric(default) - 1)) + 
    geom_point() + 
    geom_smooth(method = 'glm', method.args = list(family = 'binomial')) + 
    scale_y_continuous('default', breaks = 0:1, labels = levels(d$default))