Contrasts for interaction effects via R's formula syntax

Question

Contrasts for interaction effects via R's formula syntax

57 views Asked by A.Fischer At 13 November 2023 at 18:57

I am trying to implement behavior of the fixest package regarding interaction effects and contrasts using only base R and the stats package.

Here is the behavior that I want to implement:

library(fixest)
df_het <- read.csv("https://raw.githubusercontent.com/s3alfisc/pyfixest/master/pyfixest/experimental/data/df_het.csv")

es <- feols(dep_var ~ i(state, year, ref = 1) , df_het)
es |> coef()
# (Intercept)  state::2:year  state::3:year  state::4:year 
# 6.589885e-01   2.531469e-04   3.816723e-05   5.276097e-04 
# state::5:year  state::6:year  state::7:year  state::8:year 
# 5.044601e-04   5.606175e-04   5.949685e-04   6.816374e-04 
# ...

Here, the ref argument of fixest::i() drops the column with the reference level 1.

If I use contrasts via stats instead, the reference level is not dropped:

df_het$state <- as.factor(df_het$state)
n = length(unique(df_het$state))
lm( dep_var ~ C(state, contr.treatment(n = n,base=1)):year, data=df_het ) |> coefficients()
# (Intercept) 
# -125.01546316 
# C(state, contr.treatment(n = n, base = 1))1:year 
# 0.06272086 
# C(state, contr.treatment(n = n, base = 1))2:year 
# 0.06293242 
# C(state, contr.treatment(n = n, base = 1))3:year 
# 0.06271744

Is there any way to mimic the behavior of fixest without writing a custom contrast and without using other packages, i.e. only relying on the base and stats packages?

I understand that one viable strategy would be to create the model matrix and to then drop the variable "by hand", which is something I'd prefer to avoid.

fit <- lm( dep_var ~ C(state, contr.treatment(n = n,base=1)):year, data=df_het )
model.matrix(fit)[,-2] |> colnames()
# > model.matrix(fit)[,-2] |> colnames()
# [1] "(Intercept)"                                      
# [2] "C(state, contr.treatment(n = n, base = 1))2:year" 
# [3] "C(state, contr.treatment(n = n, base = 1))3:year" 
# [4] "C(state, contr.treatment(n = n, base = 1))4:year" 
# [5] "C(state, contr.treatment(n = n, base = 1))5:year"

Original Q&A

There are 1 answers

**IRTFM** · Answer 1 · 2023-11-14T01:27:50+00:00

I can't really tell what's going on but you do apparently already understand that the intercept is the coefficient that you would find if feols allowed you to drop the R behavior of setting the Intercept to be the estimate for the base level. It sounds like you just want to see a relabeling with (Intercept) replaced by a label for the first level, in which case it would be a simple matter of changing the name of "(Intercept)" to "state::1:year". Whether you want to do that is not clear to me. When I plot all of the the coefficients I get a plot that looks like this:

If on the other hand I leave the "(Intercept)" aka "state::1:year" out I get what appears to be, possibly, a more informative plot:

So there is a definite upward trend from the 2-state to the 40-state. Admittedly this is with an Intercept that is several orders of magnitude different, so depending on the reality underlying this model building effort there might be some data munging errors.

TechQA.

Contrasts for interaction effects via R's formula syntax

There are 1 answers

Related Questions in R

Related Questions in FIXEST

Popular Questions

Popular Tags

Trending Questions