I am trying to run "nvprof" from command line on R. Here is how I am doing it:
./nvprof --print-gpu-trace --devices 0 --analysis-metrics --export-profile /home/xxxxx/%p R
This gives me a R prompt and i write R code. I can do with Rscript too.
Problem i see is when i give --analysis-metrics option it gives me lots of lines similar to ==44041== Replaying kernel "void ger_kernel(cublasGerParams)"
And R process never ends. I am not sure what I am missing.
nvprof doesn't modify process exit behavior, so I think you're just suffering from slowness because your app invokes a lot of kernels. You have two options to speed this up.
1. Selectively profiling metrics
The
--analysis-metrics
option enables collection of a number of metrics, which requires kernels to be replayed - collecting a different set of metrics for each kernel run.If your application has a lot of kernel invocations, this can take time. I'd suggest you query the available metrics with the
nvprof --query-metrics
command, and then manually choose the metrics you are interested in.Once you know which metrics you want, you can query them using
nvprof -m metric_1,metric_2,...
. This way, the application will profile less metrics, hence requiring less replays, and running faster.2. Selectively profiling kernels
Alternatively, you can only profile a specific kernel using the
--kernels <context id/name>:<stream id/name>:<kernel name>:<invocation>
option.For example,
nvprof --kernels ::foo:2 --analysis-metrics ./your_cuda_app
will profile all analysis metrics for the kernel whose name contains the stringfoo
, and only on its second invocation. This option takes regular expressions, and is quite powerful.You can mix and match the above two approaches to speed up profiling. You will be able to find more help about these and other nvprof options using the command
nvprof --help
.