Multiple Categorical Variables in Patsy Formula

461 views Asked by At

I have some data, with multiple Categorical elements.

I'd like to model them using regression, using the patsy formula used with statsmodels for convinience.

When only using one categorical variable, as in the formula 'C(Weekday, Treatment) - 1', it works as expected, removing the intercept and leaving me with a column in the design matrix for each category.

However, when using 2 different categorical variables, such as 'C(Status, Treatment) + C(Weekday, Treatment) - 1', then the resulting matrix does indeed have no intercept, but also one of the values of "Weekday" is missing, as though I hadn't -1'd.

Is there some statistical reason for this that I'm not seeing? Isn't the removal of the intercept sufficient to stop perfect colinearity? Thanks

0

There are 0 answers