I'm trying to find the Python equivalent of R's rbeta() function in order to do some A/B testing. This is the code in R:
a <- 50
notA <- 200
b <- 200
notb <-400
trials <- 100000
alpha <- 1
beta <- 4
a.samples <- rbeta(trials, a+alpha, notA+beta)
b.samples <- rbeta(trials, b+alpha, notB+beta)
a_wins <- sum(a.samples > b.samples) / trials
b_wins <- sum(b.samples > a.samples) / trials
print(a_wins)
print(b_wins)
In this R example, B wins more than 99% of the time.
This is what I'm trying as the Python equivalent:
import scipy.stats as stats
a = 50
notA = 200
b = 200
notB =400
trials = 100000
alpha = 1
beta = 4
# Random data on which to calculate probability density
inputs = []
inputs = stats.beta(alpha, beta).rvs(size=trials)
aSamples = stats.beta(a+alpha, notA+beta).pdf(inputs)
bSamples = stats.beta(b+alpha, notB+beta).pdf(inputs)
aWins = sum(aSamples > bSamples) / trials
bWins = sum(bSamples > aSamples) / trials
print("A", aWins)
print("B", bWins)
In the Python equivalent, A wins 75% of the time.
My guess is that the problem arises with inputs, the random stats on which the probability density is calculated. They're generated here with the .rvs() method of scipy.stats.beta. I've also tried the random number module and np.linspace() to no avail. Can someone tell me what I'm missing?
@user20650 solved it in the comments. Calling .pdf() is a redundant step. The function scipy.stats.beta().rvs() is the Python equivalent of R's rbeta() function. The correct code then should read:
aSamples = stats.beta(a+alpha, notA+beta).rvs(trials)
bSamples = stats.beta(b+alpha, notB+beta).rvs(trials)
Thank you everyone who replied. This was melting my head all day yesterday.