Nsys CLI profiling guidance

4.2k views Asked by At

I am just entering into the CUDA development world and now trying to profile my code. Expected to run the nvprof tool for profiling, but get the following error:

======== Warning: This version of nvprof doesn't support the underlying device, GPU profiling skipped

Searched for a bit, found out nvprof is legacy and all profiling should now be done with Nsight Systems CLI. When running nsys nvprof ./myapp 2 files are generated: report1.nsys-rep and report1.sqlite. How can I make use of these to obtain profiling information about my code?

Environment:

WSL with Ubunutu 20.04

NVIDIA Nsight Systems version 2023.1.2.43-32377213v0

Nvprof: Release version 10.1.243 (21)

NVCC: Cuda compilation tools, release 10.1, V10.1.243

I am expecting to obtain similar information as by using nvprof: enter image description here

I have tried only this command for profiling: nsys nvprof ./myapp. Hoping to understand if it is the correct one or other better variants you might have.

Output of nsys profile --stats=true ./diverged

Generating '/tmp/nsys-report-04e5.qdstrm'
[1/8] [========================100%] report2.nsys-rep
[2/8] [========================100%] report2.sqlite
[3/8] Executing 'nvtx_sum' stats report
SKIPPED: .../sum_reduction/report2.sqlite does not contain NV Tools Extension (NVTX) data.
[4/8] Executing 'osrt_sum' stats report

 Time (%)  Total Time (ns)  Num Calls   Avg (ns)    Med (ns)   Min (ns)  Max (ns)   StdDev (ns)       Name
 --------  ---------------  ---------  ----------  ----------  --------  ---------  -----------  --------------
     74.7        364907400          6  60817900.0  72485919.0   4489170  100201745   42231058.9  poll
     24.3        118728446        345    344140.4     81962.0       541   10034413    1039273.8  ioctl
      0.6          2840826          9    315647.3    449904.0      2254     535093     236455.8  read
      0.2           920219          2    460109.5    460109.5    105991     814228     500799.2  sem_timedwait
      0.1           471795          2    235897.5    235897.5     70382     401413     234074.3  pthread_create
      0.1           310682         25     12427.3      8907.0      2785      95078      18330.8  mmap
      0.0            84580          9      9397.8     10049.0      1473      15419       4316.1  open
      0.0            80611         13      6200.8      4559.0      1382      17002       5451.1  fopen
      0.0            65704          3     21901.3     21310.0     20649      23745       1630.5  write
      0.0            48833         26      1878.2        70.5        60      46898       9182.3  fgets
      0.0            18413          6      3068.8      1738.0      1182       8455       2815.7  fclose
      0.0             8245          1      8245.0      8245.0      8245       8245          0.0  pipe2
      0.0             7233          2      3616.5      3616.5      1853       5380       2494.0  munmap
      0.0             6662          5      1332.4      1533.0       351       1853        579.3  fcntl

[5/8] Executing 'cuda_api_sum' stats report
SKIPPED: .../sum_reduction/report2.sqlite does not contain CUDA trace data.
[6/8] Executing 'cuda_gpu_kern_sum' stats report
SKIPPED: .../sum_reduction/report2.sqlite does not contain CUDA kernel data.
[7/8] Executing 'cuda_gpu_mem_time_sum' stats report
SKIPPED: .../sum_reduction/report2.sqlite does not contain GPU memory data.
[8/8] Executing 'cuda_gpu_mem_size_sum' stats report
SKIPPED: .../sum_reduction/report2.sqlite does not contain GPU memory data.
2

There are 2 answers

0
Dnb Dinusha On

You need to run something like nsys profile -t cuda ./test for cuda profiling

3
Zois Tasoulas On

nvprof is a legacy tool and will not be receiving new features. It would be best to switch to Nsight Systems or Nsight Compute, depending on your profiling goals.

Unless you have a specific profiling goal, the suggested profiling strategy is starting with Nsight Systems to determine system bottlenecks and identifying kernels that affect performance the most. On a second step, you can use Nsight Compute to profile the identified kernels and find ways to optimize them.

If you are familiar with nvprof and want to keep using it, Nsight Systems supports the nvprof command, you can find more information in the documentation section Migrating from NVIDIA nvprof, or from nsys nvprof --help.

When running nsys nvprof ./myapp 2 files are generated: report1.nsys-rep and report1.sqlite. How can I make use of these to obtain profiling information about my code?

Regarding the use of the .nsys-rep file, you can view its content using the Nsight Systems GUI, available for Windows, Linux (x86_64,SBSA), Mac. That means you can collect a profile on your target machine and share it and view it on other machines too. For example you can download the Windows Host to install the GUI.

You can extract profiling information on a terminal by using the nsys stats [3] and nsys analyze [4] commands. The latter two commands can receive either an .nsys-rep file or an .sqlite file as input.

.sqlite files can also be used as conventional database files, that would probably be needed for more advanced usecases.