How long does it take to create 1 million threads in Haskell?

3.2k views Asked by At

What I understand, Haskell have green threads. But how light weight are they. Is it possible to create 1 million threads?

Or How long would it take for 100 000 threads?

4

There are 4 answers

3
Justin On

Well according to here the default stack size is 1k, so I suppose in theory it would be possible to create 1,000,000 threads - the stack would take up around 1Gb of memory.

0
Don Stewart On

Using the benchmark here, http://www.reddit.com/r/programming/comments/a4n7s/stackless_python_outperforms_googles_go/c0ftumi

You can improve the performance on a per benchmark-basis by shrinking the thread stack size to one that fits the benchmark. E.g. 1M threads, with a 512 byte stack per thread, takes 2.7s

$ time ./A +RTS -s -k0.5k
1
barkmadley On

from here.

import Control.Concurrent
import Control.Monad

n = 100000

main = do
    left  <- newEmptyMVar
    right <- foldM make left [0..n-1]
    putMVar right 0    -- bang!
    x <- takeMVar left -- wait for completion
    print x
 where
    make l n = do
       r <- newEmptyMVar
       forkIO (thread n l r)
       return r

thread :: Int -> MVar Int -> MVar Int -> IO ()
thread _ l r = do
   v <- takeMVar r
   putMVar l $! v+1

on my not quite 2.5gh laptop this takes less than a second.

set n to 1000000 and it becomes hard to write the rest of this post because the OS is paging like crazy. definitely using more than a gig of ram (didn't let it finish). If you have enough RAM it would definitely work in the appropriate 10x the time of the 100000 version.

0
marni On

For this synthetic test case, spawning hardware threads results in significant overheads. Working just with green threads looks like a preferred option. Note that spawning green threads in Haskell is indeed cheap. I've re-run the above program, with n = 1m on MacBook Pro, i7, 8GB of RAM, using:

$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.6.3

Compiled with -threaded and -rtsopts:

$ time ./thr
1000000

 real   0m5.974s
 user   0m3.748s
 sys    0m2.406s

Reducing the stack helps a bit:

$ time ./thr +RTS -k0.5k
1000000

 real   0m4.804s
 user   0m3.090s
 sys    0m1.923s

Then, compiled without -threaded:

$ time ./thr
1000000

 real   0m2.861s
 user   0m2.283s
 sys    0m0.572s

And finally, without -threaded and with reduced stack:

$ time ./thr +RTS -k0.5k
1000000

 real   0m2.606s
 user   0m2.198s
 sys    0m0.404s