I am trying to implement behavior of the fixest
package regarding interaction effects and contrasts using only base R and the stats
package.
Here is the behavior that I want to implement:
library(fixest)
df_het <- read.csv("https://raw.githubusercontent.com/s3alfisc/pyfixest/master/pyfixest/experimental/data/df_het.csv")
es <- feols(dep_var ~ i(state, year, ref = 1) , df_het)
es |> coef()
# (Intercept) state::2:year state::3:year state::4:year
# 6.589885e-01 2.531469e-04 3.816723e-05 5.276097e-04
# state::5:year state::6:year state::7:year state::8:year
# 5.044601e-04 5.606175e-04 5.949685e-04 6.816374e-04
# ...
Here, the ref
argument of fixest::i()
drops the column with the reference level 1
.
If I use contrasts via stats
instead, the reference level is not dropped:
df_het$state <- as.factor(df_het$state)
n = length(unique(df_het$state))
lm( dep_var ~ C(state, contr.treatment(n = n,base=1)):year, data=df_het ) |> coefficients()
# (Intercept)
# -125.01546316
# C(state, contr.treatment(n = n, base = 1))1:year
# 0.06272086
# C(state, contr.treatment(n = n, base = 1))2:year
# 0.06293242
# C(state, contr.treatment(n = n, base = 1))3:year
# 0.06271744
Is there any way to mimic the behavior of fixest
without writing a custom contrast and without using other packages, i.e. only relying on the base
and stats
packages?
I understand that one viable strategy would be to create the model matrix and to then drop the variable "by hand", which is something I'd prefer to avoid.
fit <- lm( dep_var ~ C(state, contr.treatment(n = n,base=1)):year, data=df_het )
model.matrix(fit)[,-2] |> colnames()
# > model.matrix(fit)[,-2] |> colnames()
# [1] "(Intercept)"
# [2] "C(state, contr.treatment(n = n, base = 1))2:year"
# [3] "C(state, contr.treatment(n = n, base = 1))3:year"
# [4] "C(state, contr.treatment(n = n, base = 1))4:year"
# [5] "C(state, contr.treatment(n = n, base = 1))5:year"
I can't really tell what's going on but you do apparently already understand that the intercept is the coefficient that you would find if
feols
allowed you to drop the R behavior of setting the Intercept to be the estimate for the base level. It sounds like you just want to see a relabeling with(Intercept)
replaced by a label for the first level, in which case it would be a simple matter of changing the name of "(Intercept)" to "state::1:year". Whether you want to do that is not clear to me. When I plot all of the the coefficients I get a plot that looks like this:If on the other hand I leave the "(Intercept)" aka "state::1:year" out I get what appears to be, possibly, a more informative plot:
So there is a definite upward trend from the 2-state to the 40-state. Admittedly this is with an Intercept that is several orders of magnitude different, so depending on the reality underlying this model building effort there might be some data munging errors.