treatment effect on unbalanced panel data

240 views Asked by At

enter image description hereI have one unbalanced dataset that contains movie sales data along with some of the characteristics of the movies for several years. One treatment (event) happened in the society in a specific year in between. Now, I want to check with r whether this treatment affected sales of the movies with some special characteristics or not. My issue is that as I checked a lot of DiD and FE models, the treatment population is the same before and after the treatment which is not in my case. Because the movies released before the event are completely different from the ones released after that event. and I am looking for any change in the coefficient of a movie character on its sale. kindly would you please guide me that which model or r package should I use?

1

There are 1 answers

3
Tilt On

You could use a linear model from the package stats and use the formula

lm(sales ~ treatment + characteristic1 + characteristic2 + characteristic n)

This would partition the variance in sales that is explained by each of your variables (namely characteristics). However it is difficult to answer your question without an idea of what your dataset looks like. For a simple linear model to work your residuals need to be normally-distributed and the variances homogeneous, among other assumptions.

ADDENDUM 1 Since your treatment is an event that affects all movies in the US past 2011, you should code it as a 0/1 variable with something like

data.frame$treatment<-ifelse(year>=2011 & production_country=='United States', 1, 0)

Then if you are interested in the effect of the treatment on the coefficient of some other characteristic then you are interested in the interaction between treatment and a characteristic in question. This would be coded with a * like so:

lm(sales ~ treatment * characteristic of interest)

It will be important to think carefully beforehand about which characteristic should be influenced by the treatment and not to test every possible combination (I don't know how many individual movies you have (i.e. how large your n is) but if you put an interaction on every term you might have a hard time at estimating coefficients).

ALSO, you should think about the structure of your data. If you have multiple movies from the the same country, as well as multiple movies in the same year, and in the same genre, these factors may influence the sales and as such it is important to include them in your model (if they are not variables you are interested in and if there are many categories, you may include them as random effects). For example the year that the movie came out may influence the sales because it was a recession year, or because there was a pandemic or any other reason we can't quite grasp. This is a good example of when we would code year as a random effect (although there is MUCH dissent on what should or should not be used as a random effect and when it should be used as a fixed effect rather than a random effect, you can read about this here). You can use the lme4 or nlme packages to code random effects in your model. I like lme4 because of the simplicity of the coding of random effects and because it doesn't give back p-values. To get you started, here is how you would code random effects model in lme4:

library(lme4)
lmer(sales ~ genre * treatment + (1|Production Year))

Let us know how it works!