Contrasts for interaction effects via R's formula syntax

67 views Asked by At

I am trying to implement behavior of the fixest package regarding interaction effects and contrasts using only base R and the stats package.

Here is the behavior that I want to implement:

library(fixest)
df_het <- read.csv("https://raw.githubusercontent.com/s3alfisc/pyfixest/master/pyfixest/experimental/data/df_het.csv")

es <- feols(dep_var ~ i(state, year, ref = 1) , df_het)
es |> coef()
# (Intercept)  state::2:year  state::3:year  state::4:year 
# 6.589885e-01   2.531469e-04   3.816723e-05   5.276097e-04 
# state::5:year  state::6:year  state::7:year  state::8:year 
# 5.044601e-04   5.606175e-04   5.949685e-04   6.816374e-04 
# ... 

Here, the ref argument of fixest::i() drops the column with the reference level 1.

If I use contrasts via stats instead, the reference level is not dropped:

df_het$state <- as.factor(df_het$state)
n = length(unique(df_het$state))
lm( dep_var ~ C(state, contr.treatment(n = n,base=1)):year, data=df_het ) |> coefficients()
# (Intercept) 
# -125.01546316 
# C(state, contr.treatment(n = n, base = 1))1:year 
# 0.06272086 
# C(state, contr.treatment(n = n, base = 1))2:year 
# 0.06293242 
# C(state, contr.treatment(n = n, base = 1))3:year 
# 0.06271744 

Is there any way to mimic the behavior of fixest without writing a custom contrast and without using other packages, i.e. only relying on the base and stats packages?

I understand that one viable strategy would be to create the model matrix and to then drop the variable "by hand", which is something I'd prefer to avoid.

fit <- lm( dep_var ~ C(state, contr.treatment(n = n,base=1)):year, data=df_het )
model.matrix(fit)[,-2] |> colnames()
# > model.matrix(fit)[,-2] |> colnames()
# [1] "(Intercept)"                                      
# [2] "C(state, contr.treatment(n = n, base = 1))2:year" 
# [3] "C(state, contr.treatment(n = n, base = 1))3:year" 
# [4] "C(state, contr.treatment(n = n, base = 1))4:year" 
# [5] "C(state, contr.treatment(n = n, base = 1))5:year" 
1

There are 1 answers

0
IRTFM On

I can't really tell what's going on but you do apparently already understand that the intercept is the coefficient that you would find if feols allowed you to drop the R behavior of setting the Intercept to be the estimate for the base level. It sounds like you just want to see a relabeling with (Intercept) replaced by a label for the first level, in which case it would be a simple matter of changing the name of "(Intercept)" to "state::1:year". Whether you want to do that is not clear to me. When I plot all of the the coefficients I get a plot that looks like this:

Intercept and other coeffs

If on the other hand I leave the "(Intercept)" aka "state::1:year" out I get what appears to be, possibly, a more informative plot:

enter image description here

So there is a definite upward trend from the 2-state to the 40-state. Admittedly this is with an Intercept that is several orders of magnitude different, so depending on the reality underlying this model building effort there might be some data munging errors.