My setup environment: CUDA 10.2 Device: RTX 2080 OS: Ubuntu 16.04 When I try to use nvprof, I find that it doesn't support devices with compute capability 7.2 and higher. It is recommended that I should use Nsight Compute or Nsight Systems instead. But I can not launch the above two software because of the lack of graphical interface. How could I use Nsight Compute in remote server? By the way, is it possible to profile metrics in Nsight Compute?
How to profile in CUDA application with compute capability 7.x? Is metric "dram_read_throughput" valid in Nsight Compute?
3.6k views Asked by fishmingee At
1
There are 1 answers
Related Questions in CUDA
- CUDA matrix inversion
- How can I do a successful map when the number of elements to be mapped is not consistent in Thrust C++
- Subtraction and multiplication of an array with compute-bound in CUDA kernel
- Is there a way to profile a CUDA kernel from another CUDA kernel
- Cuda reduce kernel result off by 2
- CUDA is compatible with gtx 1660ti laptop GPU?
- How can I delete a process in CUDA?
- Use Nvidia as DMA devices is possible?
- How to runtime detect when CUDA-aware MPI will transmit through RAM?
- How to tell CMake to compile all cpp files as CUDA sources
- Bank Conflict Issue in CUDA Shared Memory Access
- NVIDIA-SMI 550.54.15 with CUDA Version: 12.4
- Using CUDA with an intel gpu
- What are the limits on CUDA printf arguments?
- Why do CUDA asynchronous errors occur? (occur on the linux OS)
Related Questions in NSIGHT-COMPUTE
- Bank Conflict Issue in CUDA Shared Memory Access
- Nsight Compute Range Replay mode usage
- Nsight Compute can not non-interactive Profiler in Windows
- How do I analyze register spills with Nsight Compute?
- use NCU with tensorRT, but got No kernels were profiled
- CUDA math function register usage
- Roofline Model with CUDA Manual vs. Nsight Compute
- Unbalanced Memory Read & Write in CUDA
- L2 Fabric cache hit rate of CUDA kernels on A100
- With the NSight Compute profiler, can I check cache hit rates for a specific region of memory?
- Why is the Compute Throughput’s value different from the actual Performance / Peak Performance?
- Can I skip ahead to profile a specific invocation of a specific kernel?
- ncu-ui won't run: Could not load the Qt platform plugin "xcb" in "" even though it was found
- Nsight Compute says: "Profiling is not supported on this device" - why?
- Filter on partial kernel name with Nsight Compute
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
For compute capability 7.5 and higher the recommended tools are nsight compute, and nsight systems. The documentation for nsight compute is here, the documentation for nsight systems is here. There is an introductory blog describing these "new" CUDA profiler tools here, and a tutorial blog on nsight systems here and a tutorial blog on nsight compute here. The introductory blog describes why there are 2 tools, and how they relate to each other.
It is not. The naming format of that metric indicates it is a nvprof metric. The nvprof metric names can generally not be used directly in Nsight Compute. To find out if there is an "equivalent" metric in nsight compute for a given nvprof metric, use the nvprof transition guide, in particular the metric comparison table. By studying that table, you'll note that there is a Nsight compute metric that is equivalent to
dram_read_throughputand it is nameddram__bytes_read.sum.per_secondFor instructions on how to capture this metric in nsight compute, please refer to the blog I already mentioned here, or refer to the documentation here.If you have the CUDA toolkit installed on the remote server, you should be able to run Nsight Compute in CLI (command-line-interface) mode. That is described both in the documentation already linked, and the blog article already linked. Alternatively, you may be able to run the GUI in remote mode, as described here.
Yes, we have already covered that.
I won't be able to use this question/answer to debug remote connection details or any other follow-up questions about specific access cases or usage scenarios of Nsight tools. There are documentation and tutorials already available. If you have another specific question, please ask a new question. To locate resources for Nsight Compute and Nsight Systems, I suggest simply googling those names. Usually the first hits will be landing pages here and here which link to all of the above resources, plus additional resources such as video tutorials describing specific cases and advanced usage.
All of these tools are available on windows as well with similar user interfaces. Furthermore, these tools can/should be used for any GPU of compute capability 7.0 or higher.