Benchmarking the following:
#!/usr/bin/env stack
-- stack --resolver lts-16.2 script --package async --package criterion
import Control.Concurrent.Async (async, replicateConcurrently_)
import Control.Monad (replicateM_, void)
import Criterion.Main
main :: IO ()
main = defaultMain [
bgroup "tests" [ bench "sync" $ nfIO syncTest
, bench "async" $ nfIO asyncTest
]
]
syncTest :: IO ()
syncTest = replicateM_ 100000 dummy
asyncTest :: IO ()
asyncTest = replicateConcurrently_ 100000 dummy
dummy :: IO Int
dummy = return $ fib 10000000000
fib :: Int -> Int
fib 0 = 1
fib 1 = 1
fib n = fib (n - 1) + fib (n - 2)
Gives me this:
% ./applicative-v-monad.hs
benchmarking tests/sync
time 2.120 ms (2.075 ms .. 2.160 ms)
0.997 R² (0.994 R² .. 0.999 R²)
mean 2.040 ms (2.023 ms .. 2.073 ms)
std dev 77.37 μs (54.96 μs .. 122.8 μs)
variance introduced by outliers: 23% (moderately inflated)
benchmarking tests/async
time 475.3 ms (310.7 ms .. 642.8 ms)
0.984 R² (0.943 R² .. 1.000 R²)
mean 527.2 ms (497.9 ms .. 570.9 ms)
std dev 41.30 ms (4.833 ms .. 52.83 ms)
variance introduced by outliers: 21% (moderately inflated)
Where it is apparent that asyncTest runs longer than syncTest.
I would have thought that running expensive actions concurrently will be faster than running them in sequence. Is there some flaw in my reasoning?
There are a few problems with this benchmark.
First of all laziness
As @David Fletcher pointed out, you are not forcing computation of fib. The fix for this problem normally would be as easy as:
Which is enough to make us wait for eternity. Lowering it to something more manageable is the next thing we should do:
This normally would be enough, however ghc is too smart and it will see that this computation is really pure and will optimize the loop of 100000 iterations into a single computation and return the same result 100000 times, so in reality it will compute this fib only once. Instead lets make
fib
depend on the number of iteration:Next problem is compilation
stack script
will run the code iterpreted and without threaded environment. So your code will run slow and sequentially. We fix it with manual compilation and some flags:Of course, for a full blown stack project all these flags go into a cabal file instead and running
stack bench
will do the rest.Last, but not least. Too many threads.
In the question you have
asyncTest = replicateConcurrently_ 100000 dummy
. Unless number of iterations is very low, which it is not, you don't want to useasync
for this because spawning at least a 100000 threads is not free, it is better to use a work stealing scheduler for this type of work load. I specifically wrote a library for this purpose:scheduler
Here is an example how to use it:
Now this will give us more sensible numbers: