How to pass a large amount of models to gather_predictions

463 views Asked by At

In the modelr package the function gather_predictions can be used to add predictions from multiple models to a data frame, I'm however unsure on how to specify these models in the function call. The help documentation gives the following exmaple:

df <- tibble::data_frame(
  x = sort(runif(100)),
  y = 5 * x + 0.5 * x ^ 2 + 3 + rnorm(length(x))
)

m1 <- lm(y ~ x, data = df)
grid <- data.frame(x = seq(0, 1, length = 10))
grid %>% add_predictions(m1)

m2 <- lm(y ~ poly(x, 2), data = df)
grid %>% spread_predictions(m1, m2)
grid %>% gather_predictions(m1, m2)

here the models are specifically mentioned in the function call. That works fine if we have a few models we want predictions for, but what if we have a large or unknown amount of models? In this case manually specifying the models isn't really workable anymore.

the way the help documentation phrases the arguments segment seems to suggest you need to add every model as a separate argument.

gather_predictions and spread_predictions take multiple models. The name will be taken from either the argument name of the name of the model.

And for example inputting a list of models into gather_predictions doesn't work.

Is there some easy way to input a list / large amount of models to gather_predictions?

example for 10 models in a list:

modelslist <- list()
for (N in 1:10) {
  modelslist[[N]] <- lm(y ~ poly(x, N), data = df)
}

If having the models stored some other way than a list works better, that's fine as well.

2

There are 2 answers

2
Hack-R On
m <- grid %>% gather_predictions(lm(y ~ poly(x, 1), data = df))
for (N in 2:10) {
  m <- rbind(m, grid %>% gather_predictions(lm(y ~ poly(x, N), data = df)))
}
0
Ehsan Danesh On

There are workarounds to solve this problem. My approach was to: 1. build a list of models with specific names 2. use a tweaked version of modelr::gather_predictions() to apply all models in the list to data

# prerequisites
library(tidyverse)
set.seed(1363)    

# I'll use generic name 'data' throughout the code, so you can easily try other datasets.
# for this example I'll use your data df
data=df

# data visualization
ggplot(data, aes(x, y)) + 
        geom_point(size=3)

your sample data

# build a list of models
models <-vector("list", length = 5)
model_names <- vector("character", length=5)
for (i in 1:5) {
        modelformula <- str_c("y ~ poly(x,", i, ")", sep="")
        models[[i]] <- lm(as.formula(modelformula), data = data)
        model_names[[i]] <- str_c('model', i) # remember we name the models here sequantially
}

# apply names to the models list
names(models) <- model_names

# this is modified verison of modelr::gather_predictions() in order to accept list of models
gather.predictions <- function (data, models, .pred = "pred", .model = "model") 
{
        df <- map2(models, .pred, modelr::add_predictions, data = data)
        names(df) <- names(models)
        bind_rows(df, .id = .model)
}

# the rest is the same as modelr's function...
grids <- gather.predictions(data = data, models = models, .pred = "y")

ggplot(data, aes(x, y)) + 
        geom_point() +
        geom_line(data = grids, colour = "red") +
        facet_wrap(~ model)

example of polynomial models (degree 1:5) applied to your sample data

side note: there are good reasons why I chose strings to build the model...to discuss.