Linked Questions

Popular Questions

Let's say we have this simple Haskell function that produces Pythagorean triples:

pytha :: [(Int, Int, Int)]
pytha = [(x, y, z)
        | z <- [0..]
        , x <- [1..z]
        , y <- [x..z]
        , x * x + y * y == z * z
        ]

and we'd like to benchmark how long does it take to produce, say, first 100 triples. So (using the criterion library and assuming import Criterion.Main) we have this benchmark:

main :: IO ()
main = do
  countStr <- readFile "count.txt"
  defaultMain [ bgroup "pytha" [ bench countStr $ nf (`take` pytha) (read countStr) ] ]

where we even read the count from a file to make sure ghc does not try to evaluate pytha during compile time!

Doing echo 100 > count.txt, compiling the benchmark with -O2 and running on my machine (a 4.0 GHz Sandy Bridge CPU) shows some interesting numbers:

time                 967.4 ns   (957.6 ns .. 979.3 ns)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 979.6 ns   (967.9 ns .. 995.6 ns)
std dev              45.34 ns   (33.96 ns .. 60.29 ns)

Slightly modifying this program to show how many triples were considered overall (by producing all the triples first, zipping the list with [0..] and then filtering out all non-Pythagorean triples and looking at the indices of the resulting ones) shows that almost 900000 triples were considered.

All this naturally raises the question: how does the code above manage to achieve 1000 triples/ns on a single core of a pretty standard CPU? Or is it just that my benchmark is wrong?

Related Questions