This is a difficult question to answer as I cannot provide actual replication data (my apologies, the data is too large/sensitive) but I would appreciate anyone's ideas. I use sample data here to show the structure of the code, though this code will not produce the error I detail below.
I have a complicated function where I perform multiple regressions to isolate interaction effects and plot them together using the plot_slopes function from the wonderful marginaleffects package. My code is structured like the below.
library(tidyverse)
library(marginaleffects)
outcome_var_list = c("mpg","cyl","hp")
interact_var_list = c("am","wt")
subset_var_list = c("3","4")
combos = expand.grid(outcome_var_list, subset_var_list)
model_ <- function(k,c){
mtcars = mtcars[mtcars$gear == c,]
outcome_var_list = outcome_var_list[outcome_var_list == k]
results = list()
for (r in interact_var_list) {
f = paste(k, "~", r, "*factor(vs)")
m = lm(f, subset(mtcars, carb %in% c(1,2,3,4)))
s = plot_slopes(m, variables = r, condition = "vs", draw = FALSE)
tmp = s[, c("estimate", "conf.low", "conf.high", "vs")]
tmp$outcome = k
tmp$regressor = r
results = c(results, list(tmp))
}
results = do.call("rbind", results)
plot1 = results %>%
mutate(min = min(conf.low), max = max(conf.high)) %>%
ggplot(aes(x=factor(vs),
y=estimate,
color = regressor,
ymin=conf.low,
ymax=conf.high)) +
geom_errorbar(position = position_dodge(0.4)) +
geom_point(position = position_dodge(0.4)) +
scale_x_discrete(expand = c(0,0)) +
theme_light() +
ggtitle(label = paste0("Model 1: ",k)) +
theme(plot.title = element_text(hjust = 0.5)) +
labs(y= "Interaction Coefficient", x = "X") +
theme(plot.title = element_textbox_simple(vjust=-1)) +
theme(plot.margin = margin(2,0,0,0, "cm")) +
theme(axis.text.x = element_text(size = 5))
ggsave(plot1,file=paste0("plot_",k,".png"),path=paste0(c))
}
safe_model <- safely(model_)
iterate <-mapply(safe_model,combos$Var1,combos$Var2)
When I ran this before adding the extra subset_var_list everything was produced just fine.
However, when I added the subsetting var and switched to mapply, I now generate the model for entire subsets of the data where only a few outcome_vars can produce plots to save. I assumed this was due to a lack of coverage or something, so I removed safely, and this revealed the following error:
Error: Unable to compute predicted values with this model. You can try to supply a different dataset to the `newdata` argument.
If this does not work, you can file a report on the Github Issue Tracker:
https://github.com/vincentarelbundock/marginaleffects/issues
I again assumed this was due to a lack of coverage/variation, so I tried running plot_slopes manually for each individual interaction regression instead of through the function, and every single plot/regression was successfully generates for one of the subsets that otherwise would not generate plots. So it does not appear to be an issue with the model/lack of variation.
I may misunderstand how the newdata argument proposed in the error message works, but it seems that it has the complete data to perform the task and that that should provide no advantage. Is there a reason why the regressions/slope models can be produced individually/manually, while when put together in this function they cannot be predicted? How can I fix this problem and create all the plots I need for every subset?
UPDATE 2
I was able to make it work, but I am still wondering what the issue is, and hopefully if there is a way to increase the speed of running the models.
I filtered the model within the for loop to remove NA's for the outcome variable. My rationale with this is that some interaction variables in my dataset do not have any data coverage, so perhaps when they are all NAs it triggers the message about insufficient factor levels, even though all NAs are removed in the regression, so they should be dropped. I do this manually in case before the regression, and it now works
My new code is the below:
library(tidyverse)
library(marginaleffects)
outcome_var_list = c("mpg","cyl","hp")
interact_var_list = c("am","wt")
subset_var_list = c("3","4")
combos = expand.grid(outcome_var_list, subset_var_list)
model_ <- function(k,c){
mtcars = mtcars[mtcars$gear == c,]
outcome_var_list = outcome_var_list[outcome_var_list == k]
results = list()
for (r in interact_var_list) {
f = paste(k, "~", r, "*factor(vs)")
m = lm(f, mtcars %>% filter(is.na(k))
s = plot_slopes(m, variables = r, condition = "vs", draw = FALSE)
tmp = s[, c("estimate", "conf.low", "conf.high", "vs")]
tmp$outcome = k
tmp$regressor = r
results = c(results, list(tmp))
}
results = do.call("rbind", results)
plot1 = results %>%
mutate(min = min(conf.low), max = max(conf.high)) %>%
ggplot(aes(x=factor(vs),
y=estimate,
color = regressor,
ymin=conf.low,
ymax=conf.high)) +
geom_errorbar(position = position_dodge(0.4)) +
geom_point(position = position_dodge(0.4)) +
scale_x_discrete(expand = c(0,0)) +
theme_light() +
ggtitle(label = paste0("Model 1: ",k)) +
theme(plot.title = element_text(hjust = 0.5)) +
labs(y= "Interaction Coefficient", x = "X") +
theme(plot.title = element_textbox_simple(vjust=-1)) +
theme(plot.margin = margin(2,0,0,0, "cm")) +
theme(axis.text.x = element_text(size = 5))
ggsave(plot1,file=paste0("plot_",k,".png"),path=paste0(c))
}
safe_model <- safely(model_)
iterate <-mapply(safe_model,combos$Var1,combos$Var2)
This runs the models and allows me to generate them successfully. However, when I try to filter the dataframe outside of the results = list(... loop to increase the speed of the processing, I still receive the original error:
Error: Unable to compute predicted values with this model. You can try to supply a different dataset to the `newdata`
argument. If this does not work, you can file a report on the Github Issue Tracker:
https://github.com/vincentarelbundock/marginaleffects/issues
This is weird to me, because theoretically if the df is filtered at either stage, the results should be the same. Is there any reason for this error when filtering outside the for loop? And if it cannot be resolved, is there a way to speed up this process when filtering within the for loop?
Using the tidyverse anyway, you could create a grid (
expand.grid) with the desired combinations and extend this dataframe with a column to hold the models and another one to hold the plot objects for further use (saving them as a side-effect along the way). Note the use ofrowwise()andlist()to fit an arbitrary object into a dataframe cell:output:
pick and re-use objects as needed, e. g.:
edit
To calculate per gear leavel, you can slightly modify above code (
nest_byof {dplyr} would be another approach):output: