I want to build a roofline model for my kernels. So I launch the ncu with the command
ncu --csv --target-processes all --set roofline mpirun -n 1 ./run_pselinv_linux_release_v2.0 -H H3600.csc -file ./tmpfile
The roofline set collects enough data to build the roofline model. But I can't figure out the meaning of each metrics clearly.
The Compute(SM) Throughput is collected by the metrics sm__throughput.avg.pct_of_peak_sustained_elapsed which is 0.64%. And I think it is the percentage of Peak Performance. But when I divide the Performance(6855693348.37) by the Peak Work(5080428410372), I get 0.13%, which is much lower than 0.64%.
Besides, I want to collect the FLOPS and memory usage in my kernel, not their throughput.
So my question is:
What is the real meaning of
SM ThroughputandMemory Throughput? Are they the percentage ofPeak WorkandPeak Traffic? By the way,Peak WorkandPeak TrafficarePeak PerformanceandPeak Bandwidth of DRAMrespectively, right?To get the real
FLOPSandmemory usageof my kernel, I want to multiply theCompute(SM) ThroughputandPeak Workto get thereal time Performance. Then I multiply thereal time Performanceandelapsed timeto get theFLOPS. So does to memory usage. Is my method correct?
I have been searching for this question for two days but still can't get a clear answer.
I find the answer from this question: Terminology used in Nsight Compute In short, the
SM Throughputand theMemory Throughputis the maximum of a series of metrics respectively. So I just tried to understand their meanings by their name, which is totally wrong.By the way, the correct way to collects FLOPS and memory usage of your model is in this lab: Roofline Model on NVIDIA GPUs The methodology this lab