I'm trying to find the Python equivalent of R's rbeta()
function in order to do some A/B testing. This is the code in R
:
a <- 50
notA <- 200
b <- 200
notb <-400
trials <- 100000
alpha <- 1
beta <- 4
a.samples <- rbeta(trials, a+alpha, notA+beta)
b.samples <- rbeta(trials, b+alpha, notB+beta)
a_wins <- sum(a.samples > b.samples) / trials
b_wins <- sum(b.samples > a.samples) / trials
print(a_wins)
print(b_wins)
In this R
example, B wins more than 99% of the time.
This is what I'm trying as the Python equivalent:
import scipy.stats as stats
a = 50
notA = 200
b = 200
notB =400
trials = 100000
alpha = 1
beta = 4
# Random data on which to calculate probability density
inputs = []
inputs = stats.beta(alpha, beta).rvs(size=trials)
aSamples = stats.beta(a+alpha, notA+beta).pdf(inputs)
bSamples = stats.beta(b+alpha, notB+beta).pdf(inputs)
aWins = sum(aSamples > bSamples) / trials
bWins = sum(bSamples > aSamples) / trials
print("A", aWins)
print("B", bWins)
In the Python equivalent, A wins 75% of the time.
My guess is that the problem arises with inputs
, the random stats on which the probability density is calculated. They're generated here with the .rvs()
method of scipy.stats.beta
. I've also tried the random number module and np.linspace()
to no avail. Can someone tell me what I'm missing?
@user20650 solved it in the comments. Calling .pdf()
is a redundant step. The function scipy.stats.beta().rvs()
is the Python equivalent of R's rbeta()
function. The correct code then should read:
aSamples = stats.beta(a+alpha, notA+beta).rvs(trials)
bSamples = stats.beta(b+alpha, notB+beta).rvs(trials)
Thank you everyone who replied. This was melting my head all day yesterday.