A/B Test statistics

2k views Asked by At

I am trying to do some statistical analysis of different A/B tests to see which alternative is better and have found conflicting information about this.

First, I am interested in a couple different things:

  • Tests that measure success by counting events, such as conversions or emails sent
  • Tests that measure success by counting revenue
  • Tests that have only two alternatives (control and new)
  • Tests that have multiple alternatives (control and multiple new)

I was hoping to find a simple set of formulae or rules for doing this analysis but have found more questions than answers.

This site says that you can't compare multi-alternative tests; you can only do pairwise comparisons and do a chi-squared analysis to see if the whole test is statistically significant or not.

This site Suggests a way to do A/B/C/D testing (starts on slide 74), analysing the results using the G-Test (which it says is related to chi-squared) but isn't clear on the details of using a fudge factor. It also suggests that you can only use the A/B/C/D approach to eliminate alternatives until you end up with a clear winner in an A/B comparison.

This site gives an example of an A/B/C/D test (including control) and shows how to compare the conversion rate to determine a winner. Unlike this approach it does not recommend eliminating alternatives but rather picks a winner right off the bat (Assuming statistically significant results).

Perhaps I'm naive but I would think that by now a stats analysis library would exist to deal with this very problem. I would also appreciate more information about what algorithms/equations are needed to solve these problems. It's been a long time since my university Stats class.

1

There are 1 answers

2
AudioBubble On

For the event generating comparison, you could approach this using Beta distributions. Each alternative has some unobserved p, the probability of producing an event. If you observe X positive events out of N, then your uncertainty about p can be modeled by Beta(X+1,N-X+1).

You can compare two alternatives by looking at P(pA > pB), where pA and pB are the two Beta distributions. Methods for computing that inequality probability can be found in this paper.

You can also compute E[pA-pB], the effect size, or compute confidence bounds of the same.