I have a brm model that I'd like to generate predicted values for a fixed value of one parameter (length). I think I need to do this with the posterior_predict function (or possibly the posterior_epred function), but these functions both generate large matrices with no labels so it's impossible to know what is actually being generated.
library(tidyverse)
library(brms)
set.seed(42)
df <- data.frame(mass = rnorm(100, mean = 50, sd = 5),
length = rnorm(100, mean = 20, sd = 2),
sex = sample(c("f", "m"), 100, replace = TRUE),
site = sample(LETTERS[1:3], 100, replace = TRUE))
m <- brm(mass ~ length * sex + site, data = df, prior = prior(normal(0, 10)))
newdata <- data.frame(site = rep(LETTERS[1:3], each = 30),
sex = rep(c("f", "m"), 45),
length = 20)
pred1 <- bind_cols(newdata, predict(m, newdata)) # results in a usable dataframe
pred2 <- posterior_predict(m, newdata) # results in a large matrix with no labels
pred3 <- posterior_epred(m, newdata) # results in a large matrix with no labels
ggplot(data = pred1, aes(x = Estimate, y = site, col = sex)) + geom_violin()

Explanation of
posterior_predictoutputBoth
posterior_predict()andposterior_epred()produce a 4000x90 matrix. Why 4000 rows? It has 4000 rows because the model fit bybrm()uses an MCMC algorithm to sample the joint posterior distribution of the parameters. By default it does this with four Markov chains that return 1000 samples each, for a total of 4000. Why 90 columns? Becausenewdatain this case has 90 rows. The columns ofpred1andpred2correspond to the rows ofnewdata, in the same order.Incidentally the difference between
posterior_predict()andposterior_epred()is similar to the difference between a prediction interval and a confidence interval around the mean in a frequentist analysis.posterior_predict()gives you draws from the posterior predictive distribution, which incorporates the residual error and thus has higher variance. In contrastposterior_epred()gives you the distribution of the expected value of the posterior.Solution without any additional packages
To get a median and a 95% equal-tailed credible interval of the posterior predictive distribution for each row in
newdatayou could do this:Additional solution using tidybayes package
You may find the tidybayes package makes this easier. The functions
add_predicted_draws()(for the posterior predictive distribution) andadd_epred_draws()(for the expected value of the posterior) output long-form "tidy" data frames. These have 360,000 rows because each row ofnewdatais repeated 4000 times, one for each of the posterior draws. Then the functionmedian_qi(), imported from ggdist which is a dependency of tidybayes, can be used to summarize with equal-tailed credible intervals. This will give you similar results to the previous code, with some extra ID columns identifying the width of the interval and which row ofnewdataeach row of the summary comes from.The width of the interval is 0.95 by default but can be changed with the
.widthargument ofmedian_qi().