I'm trying to analyze the total active minutes per user before and after an experiment. Here I've included the associated user data before and after the experiment - variant_number = 0 indicates control group while 1 means treatment group. Specifically, I'm interested in the mean (average total active minutes per user).
First, I calculated the before-after difference in treatment outcome and the before-after difference in control outcome (-183.7 and 19.4 respectively). The difference in differences = 203.1 in this case.
I'm wondering how I can use Python to construct a 95% confidence interval of the difference in differences? (I can provide more code/context if needed)
You can use a linear model and measure the interaction effect (
group[T.1]:period[T.pre]
below). The average difference in differences for these simulated data is-223.1779
, the p-value for the interaction is p < 5e-4 so highly significant and the 95% confidence interval is[-276.360, -169.995]
.Output:
EDIT:
Since your summary statistics show that your distribution is heavily skewed, bootstrapping is actually a more reliable method to estimate confidence intervals:
Output: