I have noticed that when I use nsys in my machine
nsys profile --stats=true -o output-report ./input
It outputs the data like this:
NVIDIA Nsight Systems version 2022.4.2.50-32196742v0
[5/8] Executing 'cudaapisum' stats report
Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- ------------ ------------ ---------- ----------- ------------ ----------------------
46.7 100,404,793 3 33,468,264.3 22,463.0 12,434 100,369,896 57,938,512.8 cudaMallocManaged
39.5 84,938,847 1 84,938,847.0 84,938,847.0 84,938,847 84,938,847 0.0 cudaDeviceSynchronize
13.8 29,677,781 3 9,892,593.7 9,610,457.0 9,514,092 10,553,232 574,154.9 cudaFree
0.0 82,478 1 82,478.0 82,478.0 82,478 82,478 0.0 cuLibraryLoadData
0.0 40,588 1 40,588.0 40,588.0 40,588 40,588 0.0 cudaLaunchKernel
0.0 892 1 892.0 892.0 892 892 0.0 cuModuleGetLoadingMode
The section is described by "Executing 'cudaapisum' stats report" instead of the normal title like "CUDA API Statistics". So I'm wondering if there's a flag that I can use to output the stats like the one below:
The output below isn't from my machine, it's from AWS's machine.
NVIDIA Nsight Systems version 2021.1.1.66-6c5c5cb
CUDA API Statistics:
Time(%) Total Time (ns) Num Calls Average Minimum Maximum Name
------- --------------- --------- ----------- --------- --------- ---------------------
61.5 250696605 3 83565535.0 36197 250541972 cudaMallocManaged
32.8 133916228 1 133916228.0 133916228 133916228 cudaDeviceSynchronize
5.7 23226526 3 7742175.3 6373371 9064987 cudaFree
0.0 56395 1 56395.0 56395 56395 cudaLaunchKernel
And the other thing I have to mention is that on my machine it automatically outputs the profile file to a .nsys-rep extension not the .qdrep extension. Are both of them the same or different?
I've been trying to find information in the nsys documentation, but I couldn't find any. I've tried searching in stackoverflow & nvidia's forum on Nsight but none came up so far. Maybe I've missed something. Any help will be appreciated.
Note: both of them is using the same command but just a slightly different file.
.nsys-rep
is the new extension name for.qdrep
files, it is the same format though. The change happened with version 2021.4.Specifically, from the release notes of the aforementioned version:
Please note that the versions of the tool on your local machine and the AWS machine are different.
There isn't a flag to control the output you are mentioning. You could modify your workflow slightly, profile your application without the
--stats
CLI switch, and collect the report file (nsys-rep
/qdrep
). Then you can use the nsys stats command and apply specific stats reports to your report file.If you have feature requests for the Nsight Systems tool, please let us know through the NVIDIA Developer Forum.