I am wondering if there is a simple way to change what values are in the intercept, perhaps mathematically, without re-running large models. As an example:
mtcars$cyl<-as.factor(mtcars$cyl)
summary(
lm(mpg~cyl+hp,data=mtcars)
)
Output:
Call:
lm(formula = mpg ~ cyl + hp, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.818 -1.959 0.080 1.627 6.812
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.65012 1.58779 18.044 < 2e-16 ***
cyl6 -5.96766 1.63928 -3.640 0.00109 **
cyl8 -8.52085 2.32607 -3.663 0.00103 **
hp -0.02404 0.01541 -1.560 0.12995
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.146 on 28 degrees of freedom
Multiple R-squared: 0.7539, Adjusted R-squared: 0.7275
F-statistic: 28.59 on 3 and 28 DF, p-value: 1.14e-08
Now I can change the reference level to 6 cyl, and can see how 8 cyl now compares to 6 cyl, rather than 4 cyl:
mtcars$cyl<-relevel(mtcars$cyl,"6")
summary(
lm(mpg~cyl+hp,data=mtcars)
)
Output:
Call:
lm(formula = mpg ~ cyl + hp, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.818 -1.959 0.080 1.627 6.812
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22.68246 2.22805 10.18 6.48e-11 ***
cyl4 5.96766 1.63928 3.64 0.00109 **
cyl8 -2.55320 1.97867 -1.29 0.20748
hp -0.02404 0.01541 -1.56 0.12995
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.146 on 28 degrees of freedom
Multiple R-squared: 0.7539, Adjusted R-squared: 0.7275
F-statistic: 28.59 on 3 and 28 DF, p-value: 1.14e-08
What I am wondering is there a way to get these values without re-running a model? You can see that the comparison from 4 cyl to 6 cyl is the same in each model (-5.96
and 5.96
), but how would I get the estimate for the 'other' coefficient in either model (e.g. the -2.55
from the first model). Of course in this case, it takes a fraction of a second to run the other model. But with very large models, it would be convenient to be able to change reference level without re-running. Are there relatively simple ways to convert all of the estimates and standard errors to be based off of a different reference level, or is it too complicated to do such a thing?
Any solutions for lme4
, glmmTMB
, or rstanarm
models would be appreciated.
Here's a function that will give you the coefficiencts for every rearrangement of a given factor variable without having to run the model again or specify contrasts:
Suppose you had a model like this:
To see how your coefficients would change if
Species
were in a different order, you would just do:Or with your own example:
We need a bit of exposition to see why this works.
Although the function above only runs the model once, let's start by creating a list containing 3 versions of
mtcars
, where the baseline factor levels ofcyl
are all different.Now we can extract the coefficients of your model for all three versions at once using
lapply
. For clarity, we will remove thehp
coefficient, which remains static across all three versions anyway:Now, we remind ourselves that the coefficient for each factor level is given relative to the baseline level. That means for the non-intercept coefficients, we can simply add the intercept value to their coefficients to get their absolute value. That means that these numbers represent the expected value for
mpg
whenhp
equals 0 for all three levels ofcyl
Since we now have all three values as absolutes, let's rename "Intercept" to the appropriate factor level:
Finally, let's rearrange the order so we can compare the absolute values of all three factor levels:
We can see they are all the same. This is why the ordering of factors is arbitrary in
lm
: changing the order of the factor levels gives the same numerical predictions in the end, even if the summary appears different.TL;DR
So the answer to your question of where do you get the -2.55 if you only have the first model is find the difference between the non-intercept coefficients. In this case
Alternatively, add the intercept on to the non-intercept coefficients and you can see what the intercept would be if any of the levels were baseline and you can get the coefficient for any level relative to any other by simple subtraction. That's how my function
rearrange_model_factors
works.Created on 2020-10-05 by the reprex package (v0.3.0)