I am using rg + perf to measure mmap performance against pread, using minor page fault
as a performance indicator. Here is the result:
mmap
perf stat -e major-faults,minor-faults rg -j1 -F 123 a-big-file --mmap
0 major-faults
509 minor-faults
0.002241400 seconds time elapsed
0.000000000 seconds user
0.002221000 seconds sys
no-mmap
perf stat -e major-faults,minor-faults rg -j1 -F 123 a-big-file --no-mmap
0 major-faults
396 minor-faults
0.002911774 seconds time elapsed
0.002890000 seconds user
0.000000000 seconds sys
grep against an empty file
Performance counter stats for rg -j1 -F 123 empty_file --mmap
:
0 major-faults
393 minor-faults
0.001652534 seconds time elapsed
0.000000000 seconds user
0.001648000 seconds sys
It seems that using mmap causes more page faults, does any one know how to do a deep tracing of Linux, so that code that incurs minor page faults could be shown? Currently my suspicion is munmap.
Using
mmap
to read a big file normally involves soft (minor) page faults when you first touch the mmaped region, unless you useMAP_POPULATE
(which also waits for I/O if it wasn't already hot in pagecache, so most programs don't want that.)Fault-around (wiring neighbouring pages into the page tables when one faults) makes the fault cost usually not too bad. The kernel should notice a sequential read pattern in the page faults and wire up multiple pages instead of just the one that faulted.
madvise(MADV_SEQUENTIAL)
might help with that, or maybe only with I/O from disk. DoingMADV_POPULATE_READ
from another thread might be a good idea; IDK I haven't tested.MAP_POPULATE
on the initial mmap has the downside of not letting mmap return so you can't even get started reading the file, and can't overlap computation with I/O. But getting the kernel working on checking the pages and wiring them into your page table in parallel with whatever you're doing could be helpful.perf record -e page-faults
should be able to record the faulting instructions. Since your program doesn't trigger any major page faults (that have to sleep for I/O, since your big file isn't so big that the kernel can't keep it hot in the pagecache), the only page-faults will be the minor ones.(I didn't test this, and IDK what the default sample granularity is for the
page-faults
event; IDK if it would record a sample for every page fault by default.)