I have a model formula with a few interaction terms. When I update the model via update()
, the *
operator gets dropped in favor of the expanded x + y + x:y
form. This isn't a huge deal, but when comparing models via anova()
(or elsewhere), it tends to make the model formulae less visually palatable, and changing the heading of the anova
object is inconvenient.
The issue is that terms.formula()
(via update.formula()
) doesn't seem to make it trivial to force simplification of the formula (should be via the simplify option?).
For example:
# Looking for Dep ~ Ind1 * Ind2 + Ind3
update.formula(Dep ~ Ind1 * Ind2, . ~ . + Ind3)
##Dep ~ Ind1 + Ind2 + Ind3 + Ind1:Ind2
Maybe passing simplify = TRUE
does the job?
update.formula(Dep ~ Ind1 * Ind2, . ~ . + Ind3, simplify = TRUE)
##Dep ~ Ind1 + Ind2 + Ind3 + Ind1:Ind2
Or the terms get re-ordered, and that precludes it from returning the simplified version?
update.formula(Dep ~ Ind1 * Ind2, . ~ . + Ind3, simplify = TRUE, keep.order = TRUE)
##Dep ~ Ind1 + Ind2 + Ind3 + Ind1:Ind2
The workaround I'm currently using is to just fit a new model, which is less convenient when the model call takes multiple additional arguments. Are there any better solutions to updating a formula object, while maintaining the simplified *
notation?
A slightly clearer use case (full factorial design):
i1 <- sample(c("A", "B"), 100, replace = TRUE)
i2 <- sample(c("C", "D"), 100, replace = TRUE)
i3 <- sample(c("E", "F"), 100, replace = TRUE)
i4 <- sample(c("G", "H"), 100, replace = TRUE)
d1 <- rnorm(100)
df <- data.frame(d1, i1, i2, i3, i4)
m1 <- lm(d1 ~ i1 * i2 * i3, data = df)
m2 <- update(m1, formula = . ~ . * i4)
m2s <- lm(d1 ~ i1 * i2 * i3 * i4, data = df) # Explicitly declare new model
anova(m1, m2)
##Analysis of Variance Table
##Model 1: d1 ~ i1 * i2 * i3
##Model 2: d1 ~ i1 + i2 + i3 + i4 + i1:i2 + i1:i3 + i2:i3 + i1:i4 + i2:i4 +
## i3:i4 + i1:i2:i3 + i1:i2:i4 + i1:i3:i4 + i2:i3:i4 + i1:i2:i3:i4
## Res.Df RSS Df Sum of Sq F Pr(>F)
##1 92 121.07
##2 84 118.70 8 2.3646 0.2092 0.9885
anova(m1, m2s)
##Analysis of Variance Table
##Model 1: d1 ~ i1 * i2 * i3
##Model 2: d1 ~ i1 * i2 * i3 * i4
## Res.Df RSS Df Sum of Sq F Pr(>F)
##1 92 121.07
##2 84 118.70 8 2.3646 0.2092 0.9885
simplify
interms.formula
does the opposite to what you think it does. You actually wantsimplify = FALSE
, but there's no way to do that using the defaultstats::update.formula
. Here's a version that does what you want. Note that the default method has just been changed to use my version of update_no_simplify.formula, and the formula method just changed to use simplify = FALSE:Output: