plot_slopes cannot compute predicted regression model values in an iterative function, but can predict each one outside of the function

Question

plot_slopes cannot compute predicted regression model values in an iterative function, but can predict each one outside of the function

117 views Asked by flâneur At 04 October 2023 at 19:28

This is a difficult question to answer as I cannot provide actual replication data (my apologies, the data is too large/sensitive) but I would appreciate anyone's ideas. I use sample data here to show the structure of the code, though this code will not produce the error I detail below.

I have a complicated function where I perform multiple regressions to isolate interaction effects and plot them together using the plot_slopes function from the wonderful marginaleffects package. My code is structured like the below.

library(tidyverse)
library(marginaleffects)

outcome_var_list = c("mpg","cyl","hp")
interact_var_list =  c("am","wt")
subset_var_list = c("3","4")

combos = expand.grid(outcome_var_list, subset_var_list)


model_ <- function(k,c){

mtcars = mtcars[mtcars$gear == c,]

outcome_var_list = outcome_var_list[outcome_var_list == k] 

results = list()
  for (r in interact_var_list) {
    f = paste(k, "~", r, "*factor(vs)")
    m = lm(f, subset(mtcars, carb %in% c(1,2,3,4)))
    s = plot_slopes(m, variables = r, condition = "vs", draw = FALSE)
    tmp = s[, c("estimate", "conf.low", "conf.high", "vs")]
    tmp$outcome = k
    tmp$regressor = r
    results = c(results, list(tmp))
  }
results = do.call("rbind", results)  


plot1 = results %>% 
  mutate(min = min(conf.low), max = max(conf.high)) %>% 
  ggplot(aes(x=factor(vs), 
             y=estimate, 
             color = regressor, 
             ymin=conf.low, 
             ymax=conf.high)) + 
  geom_errorbar(position = position_dodge(0.4)) + 
  geom_point(position = position_dodge(0.4)) + 
  scale_x_discrete(expand = c(0,0)) + 
  theme_light() + 
  ggtitle(label = paste0("Model 1: ",k)) + 
  theme(plot.title = element_text(hjust = 0.5)) + 
  labs(y= "Interaction Coefficient", x = "X") + 
  theme(plot.title = element_textbox_simple(vjust=-1)) +
  theme(plot.margin = margin(2,0,0,0, "cm")) + 
  theme(axis.text.x = element_text(size = 5))

  ggsave(plot1,file=paste0("plot_",k,".png"),path=paste0(c))

}


safe_model <- safely(model_)
iterate <-mapply(safe_model,combos$Var1,combos$Var2)

When I ran this before adding the extra subset_var_list everything was produced just fine.

However, when I added the subsetting var and switched to mapply, I now generate the model for entire subsets of the data where only a few outcome_vars can produce plots to save. I assumed this was due to a lack of coverage or something, so I removed safely, and this revealed the following error:

Error: Unable to compute predicted values with this model. You can try to supply a different dataset to the `newdata` argument.
  If this does not work, you can file a report on the Github Issue Tracker:
  https://github.com/vincentarelbundock/marginaleffects/issues

I again assumed this was due to a lack of coverage/variation, so I tried running plot_slopes manually for each individual interaction regression instead of through the function, and every single plot/regression was successfully generates for one of the subsets that otherwise would not generate plots. So it does not appear to be an issue with the model/lack of variation.

I may misunderstand how the newdata argument proposed in the error message works, but it seems that it has the complete data to perform the task and that that should provide no advantage. Is there a reason why the regressions/slope models can be produced individually/manually, while when put together in this function they cannot be predicted? How can I fix this problem and create all the plots I need for every subset?

UPDATE 2

I was able to make it work, but I am still wondering what the issue is, and hopefully if there is a way to increase the speed of running the models.

I filtered the model within the for loop to remove NA's for the outcome variable. My rationale with this is that some interaction variables in my dataset do not have any data coverage, so perhaps when they are all NAs it triggers the message about insufficient factor levels, even though all NAs are removed in the regression, so they should be dropped. I do this manually in case before the regression, and it now works

My new code is the below:


library(tidyverse)
library(marginaleffects)

outcome_var_list = c("mpg","cyl","hp")
interact_var_list =  c("am","wt")
subset_var_list = c("3","4")

combos = expand.grid(outcome_var_list, subset_var_list)


model_ <- function(k,c){

mtcars = mtcars[mtcars$gear == c,]

outcome_var_list = outcome_var_list[outcome_var_list == k] 

results = list()
  for (r in interact_var_list) {
    f = paste(k, "~", r, "*factor(vs)")
    m = lm(f, mtcars %>% filter(is.na(k))
    s = plot_slopes(m, variables = r, condition = "vs", draw = FALSE)
    tmp = s[, c("estimate", "conf.low", "conf.high", "vs")]
    tmp$outcome = k
    tmp$regressor = r
    results = c(results, list(tmp))
  }
results = do.call("rbind", results)  


plot1 = results %>% 
  mutate(min = min(conf.low), max = max(conf.high)) %>% 
  ggplot(aes(x=factor(vs), 
             y=estimate, 
             color = regressor, 
             ymin=conf.low, 
             ymax=conf.high)) + 
  geom_errorbar(position = position_dodge(0.4)) + 
  geom_point(position = position_dodge(0.4)) + 
  scale_x_discrete(expand = c(0,0)) + 
  theme_light() + 
  ggtitle(label = paste0("Model 1: ",k)) + 
  theme(plot.title = element_text(hjust = 0.5)) + 
  labs(y= "Interaction Coefficient", x = "X") + 
  theme(plot.title = element_textbox_simple(vjust=-1)) +
  theme(plot.margin = margin(2,0,0,0, "cm")) + 
  theme(axis.text.x = element_text(size = 5))

  ggsave(plot1,file=paste0("plot_",k,".png"),path=paste0(c))

}


safe_model <- safely(model_)
iterate <-mapply(safe_model,combos$Var1,combos$Var2)

This runs the models and allows me to generate them successfully. However, when I try to filter the dataframe outside of the results = list(... loop to increase the speed of the processing, I still receive the original error:


Error: Unable to compute predicted values with this model. You can try to supply a different dataset to the `newdata`
  argument. If this does not work, you can file a report on the Github Issue Tracker:
  https://github.com/vincentarelbundock/marginaleffects/issues

This is weird to me, because theoretically if the df is filtered at either stage, the results should be the same. Is there any reason for this error when filtering outside the for loop? And if it cannot be resolved, is there a way to speed up this process when filtering within the for loop?

Original Q&A

There are 1 answers

**I_O** · Answer 1 · 2023-10-05T20:15:37+00:00

Using the tidyverse anyway, you could create a grid (expand.grid) with the desired combinations and extend this dataframe with a column to hold the models and another one to hold the plot objects for further use (saving them as a side-effect along the way). Note the use of rowwise() and list() to fit an arbitrary object into a dataframe cell:

library(dplyr)
library(ggplot2)

## subset mtcars once only (unless you want different subsets
## per row):
d <- subset(mtcars, carb %in% 1:4)

result <- 
  expand.grid(outcome = c('mpg','cyl','hp'),
              regressor = c('am', 'wt')
              ) |>
  rowwise() |>
  mutate(model = list(
           lm(sprintf('%s ~ %s * factor(vs)', outcome, regressor), data = d)
         ),
         slopes = list(slopes(model)),
         plots = list({p <- ggplot(data = slopes) +
                    geom_point(aes(factor(vs), estimate, color = term))
                    ## save the plot as you go:
                    ggsave(p, filename = sprintf('plot_%s_%s.png', outcome, regressor))
                    p ## return the plot object
         })
         )

output:

## > result
## # A tibble: 6 x 5
## # Rowwise: 
##   outcome regressor model  slopes             plots 
##   <fct>   <fct>     <list> <list>             <list>
## 1 mpg     am        <lm>   <slopes [60 x 15]> <gg>  
## 2 cyl     am        <lm>   <slopes [60 x 15]> <gg>  
## 3 hp      am        <lm>   <slopes [60 x 15]> <gg>  
## 4 mpg     wt        <lm>   <slopes [60 x 15]> <gg>  
## 5 cyl     wt        <lm>   <slopes [60 x 15]> <gg>

pick and re-use objects as needed, e. g.:

> result$model[[3]]

Call:
lm(formula = sprintf("%s ~ %s * factor(vs)", outcome, regressor), 
    data = d)

Coefficients:
   (Intercept)              am     factor(vs)1  am:factor(vs)1  
        194.17          -50.42          -92.02           28.85

edit

To calculate per gear leavel, you can slightly modify above code (nest_by of {dplyr} would be another approach):

result <- 
  expand.grid(gear = 3:5, ## add gear to factor combinations
              outcome = c('mpg','cyl','hp'),
              regressor = c('am', 'wt')
              ) |>
  rowwise() |>
  mutate(model = list(
           lm(sprintf('%s ~ %s * factor(vs)', outcome, regressor),
              ## subset by gear:
              data = subset(d, gear == gear)
              )
         ),
         slopes = list(slopes(model)),
         plots = list({p <- ggplot(data = slopes) +
                    geom_point(aes(factor(vs), estimate, color = term))
                    ## save the plot as you go:
                    ggsave(p, filename = sprintf('plot_%s_%s.png', outcome, regressor))
                    p ## return the plot object
         })
         )

output:

## # A tibble: 18 x 6
## # Rowwise: 
##     gear outcome regressor model  slopes             plots 
##    <int> <fct>   <fct>     <list> <list>             <list>
##  1     3 mpg     am        <lm>   <slopes [60 x 15]> <gg>  
##  2     4 mpg     am        <lm>   <slopes [60 x 15]> <gg>  
##  3     5 mpg     am        <lm>   <slopes [60 x 15]> <gg>  
##  4     3 cyl     am        <lm>   <slopes [60 x 15]> <gg>  
## ... 14 more rows

TechQA.

plot_slopes cannot compute predicted regression model values in an iterative function, but can predict each one outside of the function

There are 1 answers

Related Questions in R

Related Questions in GGPLOT2

Related Questions in MAPPLY

Related Questions in MARGINAL-EFFECTS

Related Questions in R-MARGINALEFFECTS

Popular Questions

Trending Questions