Page walk cycles effect on total execution cycles

163 views Asked by At

I have measured cycles and dTLB_load_misses.walk_active for an application using perf tool. I executed the application two times. First time in isolation and the second with a single thread memory-bound app running on another core (same socket). I noticed that cycles is almost steady between executions but dTLB_load_misses.walk_active increases sometimes up to 50%.

My understanding is that cycles shows the total CPU cycles the application used throughout its execution. Also, dTLB_load_misses.walk_active is the total cycles that page miss handler (PMH) actively walked the page table. If this is correct, then, how it is possible that PMH cycles increase significantly but total cycles remains steady or even slightly decrease in some cases?

The process running in isolation:

Performance counter stats for './wc ../../data/wc/300MB_1M_Keys.txt -p 10':
                                 
      186904181366      cycles                                                        (66.45%)
       13002068556      dTLB_load_misses.walk_active                                     (66.78%)
         176249928      dTLB-loads-misses         #    0.69% of all dTLB cache hits   (66.81%)
       25500563014      dTLB-loads                                                    (66.57%)
         469338866      cache-misses              #   55.629 % of all cache refs      (66.73%)
         843696674      cache-references                                              (66.64%)

       9.257579987 seconds time elapsed

The process runs beside a single thread memory-bound process:

Performance counter stats for './wc ../../data/wc/300MB_1M_Keys.txt -p 10':

      184567329729      cycles                                                        (66.50%)
       14006882920      dTLB_load_misses.walk_active                                     (66.61%)
         229413645      dTLB-loads-misses         #    0.89% of all dTLB cache hits   (66.82%)
       25693682363      dTLB-loads                                                    (66.78%)
         472148745      cache-misses              #   55.619 % of all cache refs      (66.68%)
         848898094      cache-references                                              (66.61%)

       9.243842709 seconds time elapsed

EDIT:

Transparent huge pages, THP defrag, numa_balancing, and turbo boost are disabled.

My CPU is Intel Skylake, Xeon Gold 6142.

0

There are 0 answers