Let's say we have this simple Haskell function that produces Pythagorean triples:
pytha :: [(Int, Int, Int)] pytha = [(x, y, z) | z <- [0..] , x <- [1..z] , y <- [x..z] , x * x + y * y == z * z ]
and we'd like to benchmark how long does it take to produce, say, first 100 triples. So (using the
criterion library and assuming
import Criterion.Main) we have this benchmark:
main :: IO () main = do countStr <- readFile "count.txt" defaultMain [ bgroup "pytha" [ bench countStr $ nf (`take` pytha) (read countStr) ] ]
where we even read the
count from a file to make sure ghc does not try to evaluate
pytha during compile time!
echo 100 > count.txt, compiling the benchmark with
-O2 and running on my machine (a 4.0 GHz Sandy Bridge CPU) shows some interesting numbers:
time 967.4 ns (957.6 ns .. 979.3 ns) 0.999 R² (0.998 R² .. 0.999 R²) mean 979.6 ns (967.9 ns .. 995.6 ns) std dev 45.34 ns (33.96 ns .. 60.29 ns)
Slightly modifying this program to show how many triples were considered overall (by producing all the triples first, zipping the list with
[0..] and then filtering out all non-Pythagorean triples and looking at the indices of the resulting ones) shows that almost 900000 triples were considered.
All this naturally raises the question: how does the code above manage to achieve 1000 triples/ns on a single core of a pretty standard CPU? Or is it just that my benchmark is wrong?