Improving efficiency in Stirling numbers calculation

85 views Asked by At

I've been trying to explore the possibilities of recursion in Haskell, but I don't know much about improving efficiency when facing large recursions. Today, I wanted to make a function that calculates Stirling numbers of second kind:

factorial :: Integer -> Integer
factorial n = helper n 1
    where
        helper 1 a = a
        helper n a = helper (n-1) (a*n)


stirling :: Integer -> Integer -> Integer
stirling n 1 = 1
stirling n 2 = ((2 ^n)- 2) `div` 2
stirling n k
    | k > n      = 0
    | k == n     = 1
    | k == (n-1) = (factorial n) `div` (factorial 2 * (factorial(n-2)))
    | otherwise  = stirling (n-1) (k-1) + k *(stirling (n-1) k)

I've been trying to create an auxiliary function, just as in the factorial definition, that could avoid the inconvenience of Haskell´s lazy evaluation taking too many memory resources. I want to do the same, calculating the product in each call of the

k *(stirling (n-1) k)

recursion, but I didn't get the hang of it. Any ideas?

P.D: The code already works quite decently, but I'm new with Haskell and I still don't know what the possibilities are. Any improvement is welcome (although I´d rater not use any library; I want to learn the basics before getting into that).

2

There are 2 answers

3
Daniel Wagner On BEST ANSWER

Your implementation already appears to be fairly memory efficient, at least in my tests. For example:

% time ./test salferdez 32 16
1194461517469807833782085
11.50user 0.00system 0:11.48elapsed 100%CPU (0avgtext+0avgdata 8704maxresident)k
0inputs+0outputs (0major+1267minor)pagefaults 0swaps

Changing the parameters (up or down) doesn't seems to change the 8MB residency much, so I expect that's just about the minimum startup cost of a Haskell program. On the other hand, it seems quite slow. Parameters even slightly larger than the ones I used here -- say, double them both -- took longer than I cared to wait. Simply implementing the formula from Wikipedia directly appears to be quite a lot faster, while retaining a good memory profile:

% time ./test dmwit 32 16
1194461517469807833782076
0.00user 0.00system 0:00.01elapsed 16%CPU (0avgtext+0avgdata 4352maxresident)k
0inputs+0outputs (0major+206minor)pagefaults 0swaps
% time ./test dmwit 10000 5000
<snip'd very very long output>
10.62user 0.02system 0:10.62elapsed 100%CPU (0avgtext+0avgdata 16716maxresident)k
0inputs+0outputs (1major+26643minor)pagefaults 0swaps

(I did not recompile between these runs and the previous ones, so you can be sure we both got the same level of optimization.)

On the other hand, I do get a slightly different answer than you. Perhaps you can spot a bug in one or the other of our code. In any case, here's the formula from wikipedia*, and here's my take on its most direct translation:

stirling n k = sum [(-1)^(k-i)*i^n`div`(product [1..k-i]*product[1..i]) | i <- [0..k]]

Don't forget to compile with -O2 and let GHC do all the dirty work of thinking about how to make it fast and memory-efficient.

* StackOverflow does have a mechanism for including the image directly, rather than linking it, but it doesn't seem to be working for me. Not sure why!

0
Noughtmare On

I can't write a very detailed explanation right now, but you can make it quite a bit faster by:

  • Using Mathematics simplify some formulas
  • Using bit shifts instead of division
  • Using Int for the input instead of Integer

Then you get this:

import Data.Bits

stirling :: Int -> Int -> Integer
stirling n 1 = 1
stirling n 2 = (1 `unsafeShiftL` (n - 1)) - 1
stirling n k
    | k > n        = 0
    | k == n       = 1
    | k == (n - 1) = fromIntegral ((n * (n - 1)) `unsafeShiftR` 1)
    | otherwise    = stirling (n - 1) (k - 1) + fromIntegral k * stirling (n - 1) k

The next thing I would try is dynamic programming to avoid reduntant computation.

And you use the accumulator style to make one of the non tail recursive calls tail recursive like they do in this blog post: https://www.haskell.org/ghc/blog/20190728-free-variable-traversals.html