The precision of scipy.special.gammaln

2.1k views Asked by At

The Problem

A lot of my programming involves the statistics functions in scipy.stats. A new problem required computing the pmf of the beta-binomial distribution. Because it has an analytic form, but doesn't show up in scipy.stats, I needed to define a function for its pmf myself. I am using scipy version 0.12.0 and numpy version 1.7.0.

import numpy
from scipy.special import gammaln, betaln

def beta_binomial_pmf(k, n, K, N):
    # compute natural log of pmf
    ln_pmf = ( gammaln(n+1) - gammaln(k+1) - gammaln(n-k+1) ) + \
        - betaln(K+1,N-K+1) + betaln(K+k+1,N-K+n-k+1)
    return numpy.exp(ln_pmf)

In the statistics problem I am trying to solve the values of n and k normally range between 0 and 100, but K and N can be as large as 1e9. My issue is that this function will return the same value for different inputs.

Examples

k = 0
n = 5
K = numpy.array([12, 10, 8])
N = 101677958
beta_binomial(k, n, L, N)

The resulting array is

array([ 0.99999928,  0.99999905,  0.99999928])

which is rather odd given that each of values of K is different. To get a better sense of the similarity between the first and third values in the array

1 - beta_binomial(k, n, L, N)
array([  7.15255482e-07,   9.53673862e-07,   7.15255482e-07])

A really simple test of the precision of the gammaln function is 1-(Gamma(N+1)/Gamma(N))/N. It is useful because the result is exactly 0 if you work out the algebra on paper.

N = numpy.logspace(0,10,11)
1-numpy.exp(gammaln(N+1)-gammaln(N))/N
array([  0.00000000e+00,  -1.11022302e-15,   1.90958360e-14,
    -9.94537785e-13,  -4.96402919e-12,   7.74684761e-11,
    -1.70086167e-13,   1.45905219e-08,   2.21033640e-07,
    -7.64616381e-07,   2.54126535e-06])

Question

I recognize that there is a limit to the precision one can compute, but what happens at around N=1e7 that makes the precision change in gammaln by five orders of magnitude? Suggestions on how to get around this problem?

1

There are 1 answers

0
pv. On

Your problem has to do with loss of floating point precision in subtractions. This in fact does not depend on the precision of Scipy's gammaln and betaln. The issue is that for large N, gammaln(N+1) is of the same order of magnitude as gammaln(N), but much bigger than gammaln(N+1)-gammaln(N). As a consequence, when you compute the difference, you lose ~ log10(gammaln(N)) digits of precision. This is a general issue with floating point.

You can work around this via asymtotic expansions (cf. betaln implementation, which has to deal with the same issue). Namely, you can use an expansion for Gamma(a + b) - Gamma(a) for a >> |b|, 1. In Sympy:

In [44]: def lnstirling3(z): return (z - sympify('1/2')) * log(z) - z + log(sqrt(2*pi)) + 1/(12*z) - 1/(360*z*z*z)

In [45]: a, b = symbols('a, b')

In [46]: (lnstirling3(a + b) - lnstirling3(a)).series(a, oo, 4)

 4    3    2      3    2         2                              
b    b    b      b    b    b    b    b                          
── - ── + ──   - ── + ── - ──   ── - ─                          
12   6    12     6    4    12   2    2        ⎛1⎞    ⎛1        ⎞
──────────── + ────────────── + ────── - b⋅log⎜─⎟ + O⎜──; a → ∞⎟
      3               2           a           ⎝a⎠    ⎜ 4       ⎟
     a               a                               ⎝a        ⎠

Similar asymptotic formulas can be derived for your pmf in a similar way, and they can be used instead of the usual expression when the parameters have large values.

EDIT: if you are feeling lazy, you can use the original formula together with mpmath and turn on a higher precision via mpmath.mp.dps. Be sure to cast k, n, K, N to mpmath.mpf first, before summing them, however.