Magic numbers in C++ implementation of Excel NORMDIST function

3.1k views Asked by At

Whilst looking for a C++ implementation of Excel's NORMDIST (cumulative) function I found this on a website:

static double normdist(double x, double mean, double standard_dev)
{
    double res;
    double x=(x - mean) / standard_dev;
    if (x == 0)
    {
        res=0.5;
    }
    else
    {
        double oor2pi = 1/(sqrt(double(2) * 3.14159265358979323846));
        double t = 1 / (double(1) + 0.2316419 * fabs(x));
        t *= oor2pi * exp(-0.5 * x * x) 
             * (0.31938153   + t 
             * (-0.356563782 + t
             * (1.781477937  + t 
             * (-1.821255978 + t * 1.330274429))));
        if (x >= 0)
        {
            res = double(1) - t;
        }
        else
        {
            res = t;
        }
    }
    return res;
}

My limited maths knowledge made me think about Taylor series, but I am unable to determine where these numbers come from:

0.2316419, 0.31938153, -0.356563782, 1.781477937, -1.821255978, 1.330274429

Can anyone suggest where they come from, and how they can be derived?

1

There are 1 answers

0
Alexandre C. On BEST ANSWER

Check out Numerical Recipes, chapter 6.2.2. The approximation is standard. Recall that

NormCdf(x) = 0.5 * (1 + erf(x / sqrt(2)))
erf(x) = 2 / (sqrt(pi)) integral(e^(-t^2) dt, t = 0..x)

and write erf as

1 - erf x ~= t * exp(-x^2 + P(t))

for positive x, where

t = 2 / (2 + x)

and since t is between 0 and 1, you can find P by Chebyshev approximation once and for all (Numerical Recipes, section 5.8). You don't use Taylor expansion: you want the approximation to be good in the whole real line, what Taylor expansion cannot guarantee. Chebyshev approximation is the best polynomial approximation in the L^2 norm, which is a good substitute to the very difficult to find minimax polynomial (= best polynomial approximation in the sup norm).

The version here is slightly different. Instead, one writes

1 - erf x = t * exp(-x^2) * P(t)

but the procedure is similar, and normCdf is computed directly, instead of erf.

In particular and very similarly 'the implementation' that you are using differs somewhat from the one that handles in the text, because it is of the form b*exp(-a*z^2)*y(t) but it´s also a Chevishev approx. to the erfc(x) function as you can see in this paper of Schonfelder(1978)[http://www.ams.org/journals/mcom/1978-32-144/S0025-5718-1978-0494846-8/S0025-5718-1978-0494846-8.pdf ]

Also in Numerical Recipes 3rd edition, at the final of the chapter 6.2.2 they provide a C implementation very accurate of the type t*exp(-z^2 + c0 + c1*t+ c2t^2 + c3*t^3 + ... + c9t^9)