Why using rg(ripgrep) with mmap triggers more minor page faults?

Question

Why using rg(ripgrep) with mmap triggers more minor page faults?

110 views Asked by Steve Wu At 15 October 2023 at 05:18

I am using rg + perf to measure mmap performance against pread, using minor page fault as a performance indicator. Here is the result:

mmap

perf stat -e major-faults,minor-faults  rg -j1 -F 123 a-big-file --mmap

             0      major-faults
           509      minor-faults

   0.002241400 seconds time elapsed

   0.000000000 seconds user
   0.002221000 seconds sys

no-mmap

perf stat -e major-faults,minor-faults  rg -j1 -F 123 a-big-file --no-mmap

             0      major-faults
           396      minor-faults

   0.002911774 seconds time elapsed

   0.002890000 seconds user
   0.000000000 seconds sys

grep against an empty file

Performance counter stats for rg -j1 -F 123 empty_file --mmap:

             0      major-faults
           393      minor-faults

   0.001652534 seconds time elapsed

   0.000000000 seconds user
   0.001648000 seconds sys

It seems that using mmap causes more page faults, does any one know how to do a deep tracing of Linux, so that code that incurs minor page faults could be shown? Currently my suspicion is munmap.

Original Q&A

There are 1 answers

**Peter Cordes** · Answer 1 · 2023-10-15T05:41:30+00:00

Using mmap to read a big file normally involves soft (minor) page faults when you first touch the mmaped region, unless you use MAP_POPULATE (which also waits for I/O if it wasn't already hot in pagecache, so most programs don't want that.)

Fault-around (wiring neighbouring pages into the page tables when one faults) makes the fault cost usually not too bad. The kernel should notice a sequential read pattern in the page faults and wire up multiple pages instead of just the one that faulted.

madvise(MADV_SEQUENTIAL) might help with that, or maybe only with I/O from disk. Doing MADV_POPULATE_READ from another thread might be a good idea; IDK I haven't tested. MAP_POPULATE on the initial mmap has the downside of not letting mmap return so you can't even get started reading the file, and can't overlap computation with I/O. But getting the kernel working on checking the pages and wiring them into your page table in parallel with whatever you're doing could be helpful.

perf record -e page-faults should be able to record the faulting instructions. Since your program doesn't trigger any major page faults (that have to sleep for I/O, since your big file isn't so big that the kernel can't keep it hot in the pagecache), the only page-faults will be the minor ones.

(I didn't test this, and IDK what the default sample granularity is for the page-faults event; IDK if it would record a sample for every page fault by default.)

TechQA.

Why using rg(ripgrep) with mmap triggers more minor page faults?

mmap

no-mmap

grep against an empty file

There are 1 answers

Related Questions in LINUX

Related Questions in DEBUGGING

Related Questions in PERF

Related Questions in PAGE-FAULT

Popular Questions

Popular Tags

Trending Questions