Probability mass of summing two discrete random variables, in linearithmic time

223 views Asked by At

Given two discrete random variables, their (arbitrary) probability mass functions a and b and a natural-number N such that both of the variables have the domain [0..N] (therefore the functions can be represented as arrays), the probability that the functions' corresponding random variables have a given sum (i.e. P(A+B==target)) can be computed, in O(N) time, by treating the arrays as vectors and using their dot product, albeit with one of the inputs reversed and both inputs re-sliced in order to align them and eliminate bounds errors; thus each position i of a is matched with a position j of b such that i+j==target. Such an algorithm looks something like this:

-- same runtime as dotProduct and sum; other components are O(1)
P :: Vector Int -> Vector Int -> Int -> Ratio Int
P a b target | length a /= length b = undefined
             | 0 <= target && target <= 2 * length a
             = (dotProduct (shift target a) (reverse (shift target b))) 
               %
               (sum a * sum b) -- O(length a)
             -- == sum $ map (\x -> (a!x)*(b!(target-x))) [0..length a]
             | otherwise = 0

    where
        -- O(1)
        shift t v = slice start' len' v
            where start = t - length v - 1
                  len = length v - abs start
                  -- unlike `drop ... $ take ... v`,
                  -- slice does not simply `id` when given out-of-bounds indices
                  start' = min (V.length v) (max 0 start)
                  len'   = min (V.length v) (max 0 len)

        -- usual linear-algebra definition
        -- O(length a); length inequality already guarded-away by caller
        dotProduct a b = sum $ zipWith (*) a b

Given the same information, one might treat the variables' sum as its own discrete random variable, albeit one whose probability mass function is unknown. Evaluating the entirety of this probability mass function (and thereby producing the array that corresponds thereto) can be done in O(N²) time by performing N dot-products, with each product having its operands differently-shifted; i.e.:

pm :: Vector Int -> Vector Int -> Vector (Ratio Int)
pm a b = map (P a b) $ generate (2 * length a + 1) id

I am told, however, that producing such a table of values of this probability mass function can actually be done in O(N*log(N)) time. As far as I can tell, no two of the multiplications across all of the involved dot-products share the same ordered pair of indices, and I do not think that I can, e.g., combine two dot-subproducts in any useful way to form a T(n)=2T(n/2)+O(n)-type recursion; therefore I am curious as to how and why, exactly, such a runtime is possible.

1

There are 1 answers

2
vib On BEST ANSWER

In a nutshell, you have a transformation F (called discrete Fourier transform) that maps the set of vectors of size N onto itself and such that

F(a*b) = F(a).F(b)

where * is the convolution operator you just described and . is the standard dot product.

Moreover Fis invertible and you can therefore recover a*b as

a*b = F^{-1}(F(a).F(b))

Now this is all very nice but the key point is that F (and F^{-1}) can be computed in O(N log(N)) time using something called Fast Fourier Transform (FFT). Thereby, because the usual dot product . can be computed in O(N), you obtain a O(N log(N)) algorithm for computing the convolution of two distributions.

I therefore suggest you look up this and that.