I am trying to run different multiple linear regression models in R with my sales dataset containing over 1000 products and 250 retailers. I am interested in looking at the model coefficients for each product and retailer combination. I tried using dummy variables for the categorical column but it didn't produce the individual estimates of the coefficients like I needed. How can one achieve this using a for loop that iterates over each possible combination?
Create a loop in R to run all possible combinations of categorical variables(levels) in regression
348 views Asked by asterisk21 At
2
There are 2 answers
3
On
How about something like
library(lme4)
my_data <- transform(my_data, prcomb = interaction(product, retailer, drop = TRUE))
result <- lmList(sales ~ x1 + x2 + x3 | prcomb, data = my_data)
(including drop=TRUE
because lmList
doesn't like empty categories ...)
? (Hopefully you have fewer than the full 250,000 product/retailer combinations ...?)
For example:
mt <- transform(mtcars, cylam = interaction(cyl, am, drop=TRUE))
mm <- lme4::lmList(mpg ~ wt | cylam, data =mt)
coef(mm)
summary(mm)
(Note that the summary()
method doesn't seem to be very graceful about categories with only one element in them ...)
I like this tidyverse approach, where we nest the data within each group, i.e. "cut" and "color", standins for your "product" and "retailer" variables. Then we can run a linear regression within each group on all the other variables.
Result