pymc unexpected model output

118 views Asked by At

I'm trying to use PyMC to determine the distribution of ad click through rates (CTRs). Let's say we have 1000 ads and I have measurements for clicks and views for all ads. I assume that underlying distribution of the ad CTRs is a Beta distribution, and I would like to use PyMC to estimate the parameters of this distribution. I will call these parameters in the following snippets unknown_alpha and unknown_beta.

To show my example code, here is how one could generate an example test set:

from scipy.stats import beta
from scipy.stats import geom
from scipy.stats import binom

def generate_example_data(data_size=1000, unknown_alpha=30, unknown_beta=100):
    ctrs = beta.rvs(a=unknown_alpha, b=unknown_beta, size=data_size)

    data_views = geom.rvs(0.001, size=data_size)
    data_clicks = []
    for ctr, views in zip(ctrs, data_views):
        data_clicks.append(binom.rvs(p=ctr, n=views))

    return data_views, data_clicks

And here is the code, how I tried to use PyMC:

import pymc 

def model(data_views, data_clicks):
    ctr_prior = pymc.Beta('ctr_prior', alpha=1.0, beta=1.0)
    views = pymc.Geometric('views', 0.01, observed=True, value=data_views)
    clicks = pymc.Binomial('clicks', n=views, p=ctr_prior, observed=True, value=data_clicks)

    model = pymc.Model([ctr_prior, views, clicks]) 

    mc = pymc.MCMC(model)  
    mc.sample(iter=5000, burn=5000) 

    return mc.trace('ctr_prior')[:]

views, clicks = generate_example_data()
model(views, clicks)

Output: array([ 0.])

I know that the model is not finished, yet, to infer about unknown_alpha and unknown_beta, but I don't know why I just get array([ 0.]). I expected to get a trace with 5k elements.

Can anybody explain me where I went wrong?



There are 1 answers


My guess would be the mc.sample(iter=5000, burn=5000) line. You sample for 5000, and throw away the first 5000. To keep 5000, you want mc.sample(iter=10000, burn=5000)