nvprof R gputools code never ends

226 views Asked by At

I am trying to run "nvprof" from command line on R. Here is how I am doing it:

./nvprof --print-gpu-trace --devices 0 --analysis-metrics --export-profile /home/xxxxx/%p R

This gives me a R prompt and i write R code. I can do with Rscript too.

Problem i see is when i give --analysis-metrics option it gives me lots of lines similar to ==44041== Replaying kernel "void ger_kernel(cublasGerParams)"

And R process never ends. I am not sure what I am missing.

1

There are 1 answers

0
ApoorvaJ On BEST ANSWER

nvprof doesn't modify process exit behavior, so I think you're just suffering from slowness because your app invokes a lot of kernels. You have two options to speed this up.

1. Selectively profiling metrics

The --analysis-metrics option enables collection of a number of metrics, which requires kernels to be replayed - collecting a different set of metrics for each kernel run.

If your application has a lot of kernel invocations, this can take time. I'd suggest you query the available metrics with the nvprof --query-metrics command, and then manually choose the metrics you are interested in.

Once you know which metrics you want, you can query them using nvprof -m metric_1,metric_2,.... This way, the application will profile less metrics, hence requiring less replays, and running faster.

2. Selectively profiling kernels

Alternatively, you can only profile a specific kernel using the --kernels <context id/name>:<stream id/name>:<kernel name>:<invocation> option.

For example, nvprof --kernels ::foo:2 --analysis-metrics ./your_cuda_app will profile all analysis metrics for the kernel whose name contains the string foo, and only on its second invocation. This option takes regular expressions, and is quite powerful.


You can mix and match the above two approaches to speed up profiling. You will be able to find more help about these and other nvprof options using the command nvprof --help.