I have a project which has thousands of threads, but I want to use the Nsight System to profile the CUDA code. However, loading the report takes a while which I believe is due to the high number of thread information, in addition to all the visual clutter of those threads which I don't currently care about information on.
Is there a way to toggle collecting thread information or limit it before loading a report in the Nsight System GUI?
If profiling through the CLI
Check the
-s/--sample
and--cpuctxsw
options, for theprofile
orstart
commands, link to documentation. You can set both tonone
, to minimize the amount of information collected from the CPU side.If profiling a Linux target: check also the
-t/--trace
option for theprofile
orlaunch
commands. Essentially you would like to excludeosrt
from the trace options, it is enabled by default.If you want to collect only CUDA events, then you can use
nsys profile -t cuda -s none --cpuctxsw=none <app>
.If profiling through the GUI
You can deselect the "Collect CPU IP/backtrace samples" and "Collect CPU context switch trace" boxes.
If profiling a Linux target: you can additionally deselect the "Collect OS runtime libraries trace" box.
If the data is collected, it is not possible to exclude it from rendering on the GUI. You can minimize threads, or hide them by right clicking on "Threads" -> "Show less".