Is it possible to update the formula of a model object while maintaining simplified notation?

385 views Asked by At

I have a model formula with a few interaction terms. When I update the model via update(), the * operator gets dropped in favor of the expanded x + y + x:y form. This isn't a huge deal, but when comparing models via anova() (or elsewhere), it tends to make the model formulae less visually palatable, and changing the heading of the anova object is inconvenient.

The issue is that terms.formula() (via update.formula()) doesn't seem to make it trivial to force simplification of the formula (should be via the simplify option?).

For example:

# Looking for Dep ~ Ind1 * Ind2 + Ind3

update.formula(Dep ~ Ind1 * Ind2, . ~ . + Ind3)
##Dep ~ Ind1 + Ind2 + Ind3 + Ind1:Ind2

Maybe passing simplify = TRUE does the job?

update.formula(Dep ~ Ind1 * Ind2, . ~ . + Ind3, simplify = TRUE)
##Dep ~ Ind1 + Ind2 + Ind3 + Ind1:Ind2

Or the terms get re-ordered, and that precludes it from returning the simplified version?

update.formula(Dep ~ Ind1 * Ind2, . ~ . + Ind3, simplify = TRUE, keep.order = TRUE)
##Dep ~ Ind1 + Ind2 + Ind3 + Ind1:Ind2

The workaround I'm currently using is to just fit a new model, which is less convenient when the model call takes multiple additional arguments. Are there any better solutions to updating a formula object, while maintaining the simplified * notation?


A slightly clearer use case (full factorial design):

i1 <- sample(c("A", "B"), 100, replace = TRUE)
i2 <- sample(c("C", "D"), 100, replace = TRUE)
i3 <- sample(c("E", "F"), 100, replace = TRUE)
i4 <- sample(c("G", "H"), 100, replace = TRUE)
d1 <- rnorm(100)
df <- data.frame(d1, i1, i2, i3, i4)

m1 <- lm(d1 ~ i1 * i2 * i3, data = df)
m2 <- update(m1, formula = . ~ . * i4)
m2s <- lm(d1 ~ i1 * i2 * i3 * i4, data  = df) # Explicitly declare new model

anova(m1, m2)
##Analysis of Variance Table

##Model 1: d1 ~ i1 * i2 * i3
##Model 2: d1 ~ i1 + i2 + i3 + i4 + i1:i2 + i1:i3 + i2:i3 + i1:i4 + i2:i4 + 
##    i3:i4 + i1:i2:i3 + i1:i2:i4 + i1:i3:i4 + i2:i3:i4 + i1:i2:i3:i4
##  Res.Df    RSS Df Sum of Sq      F Pr(>F)
##1     92 121.07                           
##2     84 118.70  8    2.3646 0.2092 0.9885

anova(m1, m2s)
##Analysis of Variance Table

##Model 1: d1 ~ i1 * i2 * i3
##Model 2: d1 ~ i1 * i2 * i3 * i4
##  Res.Df    RSS Df Sum of Sq      F Pr(>F)
##1     92 121.07                           
##2     84 118.70  8    2.3646 0.2092 0.9885
1

There are 1 answers

2
Nick Kennedy On BEST ANSWER

simplify in terms.formula does the opposite to what you think it does. You actually want simplify = FALSE, but there's no way to do that using the default stats::update.formula. Here's a version that does what you want. Note that the default method has just been changed to use my version of update_no_simplify.formula, and the formula method just changed to use simplify = FALSE:

update_no_simplify <- function(object, ...) {
  UseMethod("update_no_simplify")
}

update_no_simplify.formula <- function(old, new) {
  tmp <- .Call(stats:::C_updateform, as.formula(old), as.formula(new))
  formula(terms.formula(tmp, simplify = FALSE))
}

update_no_simplify.default <- function (object, formula., ..., evaluate = TRUE) {
  if (is.null(call <- getCall(object))) 
    stop("need an object with call component")
  extras <- match.call(expand.dots = FALSE)$...
  if (!missing(formula.)) 
    call$formula <- update_no_simplify.formula(formula(object), formula.)
  if (length(extras)) {
    existing <- !is.na(match(names(extras), names(call)))
    for (a in names(extras)[existing]) call[[a]] <- extras[[a]]
    if (any(!existing)) {
      call <- c(as.list(call), extras[!existing])
      call <- as.call(call)
    }
  }
  if (evaluate) 
    eval(call, parent.frame())
  else call
}

m3 <- update_no_simplify(m1, . ~ . * i4)
anova(m1, m3)

Output:

##Analysis of Variance Table
##
##Model 1: d1 ~ i1 * i2 * i3
##Model 2: d1 ~ i1 * i2 * i3 * i4
##  Res.Df    RSS Df Sum of Sq     F Pr(>F)
##1     92 95.496                          
##2     84 89.193  8    6.3032 0.742 0.6542