Profiling arbitrary CUDA applications

1.8k views Asked by At

I know of the existence of nvvp and nvprof, of course, but for various reasons nvprof does not want to work with my app that involves lots of shared libraries. nvidia-smi can hook into the driver to find out what's running, but I cannot find a nice way to get nvprof to attach to a running process.

There is a flag --profile-all-processes which does actually give me a message "NVPROF is profiling process 12345", but nothing further prints out. I am using CUDA 8.

How can I get a detailed performance breakdown of my CUDA kernels in this situation?

2

There are 2 answers

0
einpoklum On BEST ANSWER

As comments suggest, you simply have to make sure to start the CUDA profiler (now it's NSight Systems or NSight Compute, no longer nvprof) before the processes you want to profile. You could, for example, configure it to run on system startup.

Your inability to profile your application has nothing to do with it being an "app that involves lots of shared libraries" - the profiling tools profile such applications just fine.

0
eval On

I've been looking for the process attach solution too but found no existing tool.

A possible direction is to use lower CUDA API to build a tool or integrate to your tool. See cupti: https://docs.nvidia.com/cupti/r_main.html#r_dynamic_detach