Extract treatment means from an lmer object and calculate error bars

5k views Asked by At

[I'm detailing the experiment I have for background - I am clear on the method for the lmers, just unclear on how to extract some values I need/calculate them by hand, hence I posted this on SO and not CV. I hope this was the correct place to post!]

The data are here.

My experiment has a split-plot design, with levels: Block/Plot/Subplot.

There are 6 blocks. There are 2 plots in each block, and two subplots in each plot. Treatment 1 has two levels (A and B) and is applied at the Plot level: in each block there is one plot receiving Treatment 1 level A and one receiving Treatment 1 level B.

Treatment 2 is applied at the subplot level and also has two levels (C and D): each plot has one subplot receiving Treatment Two level A and one subplot receiving Treatment 2 level B.

The experiment was run for 3 years. I am interested how each combination of the two treatments impacts my dependent variable (DV).

As such I have 4 treatment combinations:

TMT1A:TMT2C

TMT1B:TMT2C

TMT1A:TMT2D

TMT1b:TMT2D

I'm using lmer for my models to account for the split-plot design. I'm running a cross year model but also a model for each year in turn (as replication in the experiment does not permit the testing of a year effect in the cross-year model - the models were ending up overparameterised).

The lmers for each year look like this:

m2011<- lmer (DV2011~ TMT1*TMT2 + (1|Block/TMT1))
m2012<- lmer (DV2012~ TMT1*TMT2 + (1|Block/TMT1))
m2013<- lmer (DV2013~ TMT1*TMT2 + (1|Block/TMT1))

For a graphical representation of the change in these treatment means over time, I want to extract the treatment means for each level of each treatment (see the four levels above) for each year, and plot these for each year of the experiment, akin to the example in this post

I am wondering, is it possible to extract the treatment means for the four different treatment combinations (such as those listed above) from an lmer object? Or would they have to be calculated by hand?

One way I thought to do it would be to actually create another factor which represents the 4 treatment combinations (see column "TMT1x2" in pasted data). Then I could run the following model for each year:

m2011<- lmer (DV2011~ TMT1x2 + (1|Block/TMT1))

and extract the treatment means for each of the 4 levels that way. However I am unsure if this method would suitably control for the split plot design, as this new 4 level factor ignores the nested nature of the levels that make it up (although the random effects do not ignore it)...

Further, if I do need to calculate the treatment means by hand, does anyone know how this could be done accounting for the levels of nesting in my experiment?

I would also like to calculate error bars around each of these treatment means...

If anyone has any insight on any of this it would be much appreciated!

3

There are 3 answers

3
Henrik On BEST ANSWER

An alternative that uses functions in package languageR. I call your data set df.

library(lme4)
library(languageR)
library(ggplot2)

# fit model
# n.b. I don't claim that this is a sensible model
# It is just used to demonstrate the plot
mod <- lmer(DV ~ TMT1 * TMT2 + (1|Block), data = df)

# create MCMC matrix
mcmc <- pvals.fnc(mod, nsim = 1000, withMCMC = TRUE)
# pval.fnc also calculates MCMC-based p-values and HPD confidence intervals,
# and plot the posterior distributions of the parameters

# plot using plotLMER.fnc 
# in addition, set withList = TRUE to create a list of data frames with plot data
# which can be used for a (possibly prettier) plot in ggplot
ll <- plotLMER.fnc(mod, withList = TRUE, pred = "TMT1", 
               intr = list(
                 "TMT2",
                 c("C", "D"),
                 "end",
                 list(c("red",  "blue"), rep(1, 2))),
               addlines = TRUE,
               mcmcMat = mcmc$mcmc)

 # here follows additional steps to plot using ggplot 

 # convert list to data frame
 df <- do.call(rbind, ll$TMT1)

 # rename 
 names(df)[names(df) == "Levels"] <- "TMT1"

 # add TMT2
 df$TMT2 <- rep(c("C", "D"), each = 2)

# plot using ggplot
dodge <- position_dodge(width = 0.1)
ggplot(data = df, aes(x = TMT1, y = Y, col = TMT2, group = TMT2)) +
   geom_point(position = dodge, size = 3) +
   geom_errorbar(aes(ymax = upper, ymin = lower, width = 0.1), position = dodge) +
   geom_line(position = dodge) +
   ylab("DV") +
   theme_classic()
2
Nate Pope On

I think what you are asking for is some form of predict(), for which there is no default method for class mer in lme4 (at least the version on CRAN). You can, however, use ez::ezPredict.

library(ez)
library(ggplot2)
to_predict <- expand.grid(TMT1=c("A","B"), TMT2=c("C","D"))
t_means <- rbind(ezPredict(m2011, to_predict=to_predict, boot=F), ezPredict(m2012, to_predict=to_predict, boot=F), ezPredict(m2013, to_predict=to_predict, boot=F) )
t_means$YEAR = rep(2011:2013, each = 4)
ggplot(t_means, aes(x=YEAR, y=value, color=TMT1:TMT2)) + geom_point() + geom_line()

This function has some additional features which may prove useful, like providing bootstrapped values.

If all you want are point estimates for the treatment means, it's just as easy to do the calculations by hand, especially as all three models have the same design matrix:

mm = unique(model.matrix(m2011))
Y_bar <- c(mm%*%fixef(m2011), mm%*%fixef(m2012), mm%*%fixef(m2013))
ggplot(t_means, aes(x=YEAR, y=Y_bar, color=TMT1:TMT2)) + geom_point() + geom_line()

I'm not exactly sure what you mean by "calculate the treatment means ... accounting for the levels of nesting in my experiment". The random effects in a mixed model are structured, normally-distributed deviations from the population-level effects (the fixed effects.) It may be instructive to look at the random effect estimates ranef(m2011) and the associated design matrix m2011@Zt.

So if all you want to plot are the population-level treatment means, you can simply work with the estimates of the fixed effects fixef(m2011) and the fixed-effect design matrix model.matrix(m2011) as above. If you would like to include some measure of uncertainty around the population-level predictions, or would like predictions for each block/plot/subplot, you will need to work with both the random and fixed effects. I suggest you begin by looking at http://glmm.wikidot.com/faq under the heading "Predictions and/or confidence (or prediction) intervals on predictions".

EDIT 8/26/2013:

You might consider bootMer() in the development version of lme4 for (parametric bootstrapped) confidence intervals around predictions, which should incorporate uncertainty in the variances of the random effects and will work with GLMMs (see for example this thread).

The idea is to simulate from the model of interest, refit with the simulated values, and calculate the statistic of interest from the refit model. You can go through the steps yourself, with simulate() and refit():

t_sim <- apply(simulate(model, 999), 2, function(x) combn(unique(model.matrix(model))%*%fixef(refit(model, x)), 2, diff) )

which generates 999 bootstrap reps of the pairwise differences between treatment means, that you could use quantile() on (or whatever flavor of bootstrap confidence interval you wish):

apply(t_sim, 1, function(.) quantile(., c(0.975, 0.025)))
0
jknowles On

With lme4 now I think bootMer() is probably the best way to go because it accounts for uncertainty of all kinds in the model. However, for certain classes of problems bootMer() is not able to work because of how long each model fit may take. For these larger problems, there is an R package called merTools which provides a predictInterval method to use arm::sim to account for uncertainty in the fixed and random effects as well as the residual error of the model. It is fairly easy to use and much faster for providing predictions in cases where the model takes a long time to fit. It has good coverage of the prediction interval produced by bootMer() for problems where the variance in the relationship among the random effects is fairly well defined.

To use it you would simply go:

library(merTools)
preds <- predictInterval(m2011, newdata = myData, level = 0.95, n.sims = 1000)

There are several other user configurable options, but the result is a prediction object similar to that produced when requesting prediction intervals from lm -- a three column data.frame with columns fit, lwr, and upr.