I used Criterion library to write benchmarks for my Haskell functions. Now I am implementing the same algorithm in C to compare performance with Haskell. The question is how can I do it reliably? Criterion does a lot of fancy stuff like accounting for clock call overhead and doing statistical analysis of the results. I guess that if I just measure time needed by my C function it will not be comparable with the results returned by Criterion. In his original post about Criterion Bryan O'Sullivan writes: "It should even be easy to use criterion to benchmark C code and command line programs." The question is how? Takayuki Muranushi compares C implementation of DFT with Haskell by spawning threads and calling the executable but I fear that this adds a lot of additional overhead (create new thread, run the application, output to stdio and then reading from it) which makes the results incomparable. I also considered using FFI, but again I fear that additional overhead would make such comparison unfair.
If there is no way of using Criterion to reliably benchmark C, then what approaches to C benchmarking would you recommend? I've read some questions here on SO and it seems that there are many different functions that allow to measure system time, but they either provide time in milliseconds or have large call overhead.
FFI can be used in such a way that it doesn't add much overhead. Consider the following program (full code available here):
The C call is compiled to the following Cmm:
Here's the assembly:
So, if you mark your C import as
unsafe
and do all marshalling before measurement, your C call will be basically just an inlinecall
instruction - the same as if you were doing all benchmarking in C. Here's what Criterion reports when I benchmark a C function that does nothing:This is approximately 400 times smaller than the estimated clock resolution on my machine (~ 5.5 us). For comparison, here's the benchmark data for a function that computes the arithmetic mean of 100 integers: