Equatiomatic notation includes fixed effect slope in the distribution of a random effect intercept

130 views Asked by At

The vignette for {equatiomatic} includes the following example (here):

library(lme4)
library(equatiomatic)

lev1_long <- lmer(score ~ wave + (1|sid) + (1|school) + (1|district),
                  data = sim_longitudinal)
extract_eq(lev1_long)

which results in the resulting notation:

enter image description here

The different random intercepts get denoted each in their own line (I presume it is implicit that they are added together to generate the combined intercept?) and the fixed slope for wave, denoted as $\beta_1$, gets included in the first line, in the distribution of the outcome variable score. I can reproduce this example successfully.

However when I work with some data that I think has a very similar structure, I get a different notation. Here is a reproducible example with a dummy dataset:

library(tidyverse)
library(lme4)
library(equatiomatic)

mock_df <- tibble(
  outcome = 1:10,
  oID = c('D', 'A', 'B', 'C', 'B', 'B', 'E', 'B', 'A', 'C'),
  treatment =  rep(c(0, 1), each = 5),
  pID = c('P1', 'P1', 'P2', 'P2', 'P2', 'P3', 'P3', 'P3', 'P4', 'P4'))

model <- lmer(outcome ~ 1 + treatment + (1|pID) + (1|oID), data = mock_df)
extract_eq(model)

Of course, the model is completely inadequate for the made-up data, but ignore that for a moment. The {equatiomatic} output is:

enter image description here

Just like in the vignette example, we have multiple random intercepts and one fixed slope. But the equation looks quite different. The slope has been moved to the third line, into the distribution of the random intercept for pID.

I do not really understand the reason for this difference in notation, or, for that matter, why the slope went into the distribution for the pID intercept rather than the oID intercept. Is there some actual fundamental difference in model structure between the example in the vignette and the made-up mock example I made that justifies the difference in notation?

1

There are 1 answers

3
dipetkov On

This illustrates why the lmer formula is not sufficient to specify the mixed effect model. We also have to state, for each fixed predictor, whether it describes the individuals or the groups.

Notice how in your mock data treatment doesn't vary within pIDs because treatment is assigned to pIDs, not to individuals; all individuals with the same pID label have the same treatment.

mock_df
#> # A tibble: 10 × 4
#>    outcome oID   treatment pID  
#>      <int> <chr>     <dbl> <chr>
#>  1       1 D             0 P1   
#>  2       2 A             0 P1   
#>  3       3 B             0 P2   
#>  4       4 C             0 P2   
#>  5       5 B             0 P2   
#>  6       6 B             1 P3   
#>  7       7 E             1 P3   
#>  8       8 B             1 P3   
#>  9       9 A             1 P4   
#> 10      10 C             1 P4

We can summarize this observation by counting treatment assigments by pID. For comparison we also count treatment assignments by oID.

treatment_by_group <- function(df, group) {
  df %>%
    count(treatment, {{ group }}) %>%
    pivot_wider(
      names_from = treatment,
      values_from = n,
      values_fill = 0L
    )
}

treatment_by_group(mock_df, pID)
#> # A tibble: 4 × 3
#>   pID     `0`   `1`
#>   <chr> <int> <int>
#> 1 P1        2     0
#> 2 P2        3     0
#> 3 P3        0     3
#> 4 P4        0     2
treatment_by_group(mock_df, oID)
#> # A tibble: 5 × 3
#>   oID     `0`   `1`
#>   <chr> <int> <int>
#> 1 A         1     1
#> 2 B         2     2
#> 3 C         1     1
#> 4 D         1     0
#> 5 E         0     1

Since treatment assignment doesn't vary within pIDs, we cannot learn about the differences between treatments by comparing individuals within pIDs; we have to compare pIDs instead.

This nesting of individuals within pIDs makes the model hierarchical. Effectively, the model has two components: a regression for each pID and then a regression for each observation given its pID. That's exactly what you see in the equatiomatic model specification.

Contrast the grouping by pID with the grouping by oID. We can estimate the difference between treatments by comparing individuals within oIDs (more precisely, by comparing individuals in the A, B and C oIDs).

PS: I wrote a longer explanation, with more formulas, in response to this Cross Validation question: Writing mathematical equations (mixed effect models)