Error running PyMC - Stochastic value is outside its support, or it forbids its parents' current values

1.3k views Asked by At

I am trying to use PyMC to take as input four different predictors of a medical condition, and combine them to yield an overall posterior probability that the patient has the condition given the subset of predictors that say "yes, this patient has this condition".

The idea is to select a theta (overall rate of the condition) and false negative and false positive rate for each predictor from a beta distribution, and from that calculate a marginal probability and the posterior probability using Bayes theorem. I have an array of 16 observed values for counts, one for every possible combination of predictors (since there are 4 predictors, there are 2**4 = 16 different possible combinations of predictors). I am feeding this last set of counts and the marginal probabilities into a multinomial distribution, analogous to how disasters_array is used with a Poisson distribution in the following example from the PyMC tutorial http://pymc-devs.github.io/pymc/tutorial.html.

Here's the code I wrote to try and do this:

from pymc import Multinomial, Beta, deterministic
from numpy import array

n = 4 # number of predictors

counts_array = array([2942808473, 17491655, 21576, 23189, 339805, 89159, 168214, 76044, 43138288, 530963, 22682, 22169, 462052, 129472, 2804257, 3454104]) # 16 counts - one count value for each possible permutation of predictors that detected medical condition
pred = array([[0,0,0,0],[1,0,0,0],[0,1,0,0],[1,1,0,0],[0,0,1,0],[1,0,1,0],[0,1,1,0],[1,1,1,0],[0,0,0,1],[1,0,0,1],[0,1,0,1],[1,1,0,1],[0,0,1,1],[1,0,1,1],[0,1,1,1],[1,1,1,1]]); # array of yes/no's from predictors for each value in counts_array above

theta = Beta('theta',1,2)

fn = fp = tn = tp = [0] * 4;
for i in range(0,n):
    fn[i] = Beta('fn' + str(i),1,2)
    fp[i] = Beta('fp' + str(i),1,2)
    tn[i] = 1 - fp[i]
    tp[i] = 1 - fn[i]

@deterministic(plot=False)
def margprobs():
    mp = [0] * 2**n; # initialize with vector of 2**n zeros
    for i in range(0,2**n):
        mp[i] = theta *\
            (fn[0]**(1-pred[i][0])) * (tp[0]**pred[i][0]) *\
            (fn[1]**(1-pred[i][1])) * (tp[1]**pred[i][1]) *\
            (fn[2]**(1-pred[i][2])) * (tp[2]**pred[i][2]) *\
            (fn[3]**(1-pred[i][3])) * (tp[3]**pred[i][3])\
            + (1-theta) *\
            (tn[0]**(1-pred[i][0])) * (fp[0]**pred[i][0]) *\
            (tn[1]**(1-pred[i][1])) * (fp[1]**pred[i][1]) *\
            (tn[2]**(1-pred[i][2])) * (fp[2]**pred[i][2]) *\
            (tn[3]**(1-pred[i][3])) * (fp[3]**pred[i][3]);
    return mp;

@deterministic(plot=False)
def postprobs():
    pp = [0] * 2**n; # initialize with vector of 2**n zeros
    for i in range(0,2**n):
        pp[i] = theta *\
            (fn[0]**(1-pred[i][0])) * (tp[0]**pred[i][0]) *\
            (fn[1]**(1-pred[i][1])) * (tp[1]**pred[i][1]) *\
            (fn[2]**(1-pred[i][2])) * (tp[2]**pred[i][2]) *\
            (fn[3]**(1-pred[i][3])) * (tp[3]**pred[i][3])\
            / margprobs[i];
    return pp;

counts = Multinomial(name="counts", value=counts_array, n=2**n, p=margprobs, observed=True)

When I run this, I get an error to do with the last line, when calculating counts:

$ python test.py
Traceback (most recent call last):
  File "test.py", line 46, in <module>
    counts = Multinomial(name="counts", value=counts_array, n=2**n, p=margprobs, observed=True)
  File "/Users/jtr4v/anaconda/lib/python2.7/site-packages/pymc/distributions.py", line 3269, in __init__
    verbose=verbose, **kwds)
  File "/Users/jtr4v/anaconda/lib/python2.7/site-packages/pymc/PyMCObjects.py", line 772, in __init__
    if not isinstance(self.logp, float):
  File "/Users/jtr4v/anaconda/lib/python2.7/site-packages/pymc/PyMCObjects.py", line 929, in get_logp
    raise ZeroProbability(self.errmsg)
pymc.Node.ZeroProbability: Stochastic counts's value is outside its support,
 or it forbids its parents' current values

Obviously this error is to do with PyMC not liking some value I'm feeding to Multinomial(), but I not sure which one is wrong. I believe that value should be counts_array (my observed values for count), n should be 16 since I want to select an array of 16 items for count, one for each possible combination of predictors, and p should be my marginal probabilities, and observed should be true since I have observed values.

What am I doing wrong?

Edit: If it helps, I was previously doing this in R2jags with the following code:

model {
    theta ~ dbeta(1,2); # draw theta from a beta distribution
    for (i in 1:N) { # draw an false positive and false negative rate for each of N predictors
        fp[i] ~ dbeta(1,2);
        fn[i] ~ dbeta(1,2);
        tp[i] <- 1-fn[i]; # true positive = 1 - false negative rate
        tn[i] <- 1-fp[i]; # true negative rate = 1 - false positive rate
    }
    for (j in 1:M) {
    # Bayes theorem, i.e.
    # posterior probability =
    # P(A) * P(B|A) /
    # /
    # P(A) * P(B|A) + P(-A) * P(B|-A)  # <--- last line is marginal probability
    #
    # x is a vector of 1's and 0's indicating whether the ith predictor said yes or no
    margprobs[j] <- (theta *
                        (fn[1]^(1-x[j,1])) * (tp[1]^x[j,1]) *
                        (fn[2]^(1-x[j,2])) * (tp[2]^x[j,2]) *
                     (fn[3]^(1-x[j,3])) * (tp[3]^x[j,3]) *
                     (fn[4]^(1-x[j,4])) * (tp[4]^x[j,4])
                 + (1-theta) *
                        (tn[1]^(1-x[j,1])) * (fp[1]^x[j,1]) *
                     (tn[2]^(1-x[j,2])) * (fp[2]^x[j,2]) *
                     (tn[3]^(1-x[j,3])) * (fp[3]^x[j,3]) *
                     (tn[4]^(1-x[j,4])) * (fp[4]^x[j,4]));

    postprobs[j] <- theta *
                        (fn[1]^(1-x[j,1])) * (tp[1]^x[j,1]) *
                        (fn[2]^(1-x[j,2])) * (tp[2]^x[j,2]) *
                        (fn[3]^(1-x[j,3])) * (tp[3]^x[j,3]) *
                        (fn[4]^(1-x[j,4])) * (tp[4]^x[j,4])
                        / margprobs[j];


    }
    counts ~ dmulti(margprobs, total);
}
0

There are 0 answers