Why using rg(ripgrep) with mmap triggers more minor page faults?

65 views Asked by At

I am using rg + perf to measure mmap performance against pread, using minor page fault as a performance indicator. Here is the result:

mmap

perf stat -e major-faults,minor-faults  rg -j1 -F 123 a-big-file --mmap
             0      major-faults
           509      minor-faults

   0.002241400 seconds time elapsed

   0.000000000 seconds user
   0.002221000 seconds sys

no-mmap

perf stat -e major-faults,minor-faults  rg -j1 -F 123 a-big-file --no-mmap
             0      major-faults
           396      minor-faults

   0.002911774 seconds time elapsed

   0.002890000 seconds user
   0.000000000 seconds sys

grep against an empty file

Performance counter stats for rg -j1 -F 123 empty_file --mmap:

             0      major-faults
           393      minor-faults

   0.001652534 seconds time elapsed

   0.000000000 seconds user
   0.001648000 seconds sys

It seems that using mmap causes more page faults, does any one know how to do a deep tracing of Linux, so that code that incurs minor page faults could be shown? Currently my suspicion is munmap.

1

There are 1 answers

0
Peter Cordes On

Using mmap to read a big file normally involves soft (minor) page faults when you first touch the mmaped region, unless you use MAP_POPULATE (which also waits for I/O if it wasn't already hot in pagecache, so most programs don't want that.)

Fault-around (wiring neighbouring pages into the page tables when one faults) makes the fault cost usually not too bad. The kernel should notice a sequential read pattern in the page faults and wire up multiple pages instead of just the one that faulted.

madvise(MADV_SEQUENTIAL) might help with that, or maybe only with I/O from disk. Doing MADV_POPULATE_READ from another thread might be a good idea; IDK I haven't tested. MAP_POPULATE on the initial mmap has the downside of not letting mmap return so you can't even get started reading the file, and can't overlap computation with I/O. But getting the kernel working on checking the pages and wiring them into your page table in parallel with whatever you're doing could be helpful.


perf record -e page-faults should be able to record the faulting instructions. Since your program doesn't trigger any major page faults (that have to sleep for I/O, since your big file isn't so big that the kernel can't keep it hot in the pagecache), the only page-faults will be the minor ones.

(I didn't test this, and IDK what the default sample granularity is for the page-faults event; IDK if it would record a sample for every page fault by default.)