I am having trouble finding out where the data for local memory usage is. Right now, I only know to look for STL instructions in the source. I wish I could find concrete numbers.
How do I analyze register spills with Nsight Compute?
90 views Asked by Marko Grdinić At
1
There are 1 answers
Related Questions in CUDA
- CUDA matrix inversion
- How can I do a successful map when the number of elements to be mapped is not consistent in Thrust C++
- Subtraction and multiplication of an array with compute-bound in CUDA kernel
- Is there a way to profile a CUDA kernel from another CUDA kernel
- Cuda reduce kernel result off by 2
- CUDA is compatible with gtx 1660ti laptop GPU?
- How can I delete a process in CUDA?
- Use Nvidia as DMA devices is possible?
- How to runtime detect when CUDA-aware MPI will transmit through RAM?
- How to tell CMake to compile all cpp files as CUDA sources
- Bank Conflict Issue in CUDA Shared Memory Access
- NVIDIA-SMI 550.54.15 with CUDA Version: 12.4
- Using CUDA with an intel gpu
- What are the limits on CUDA printf arguments?
- Why do CUDA asynchronous errors occur? (occur on the linux OS)
Related Questions in NSIGHT-COMPUTE
- Bank Conflict Issue in CUDA Shared Memory Access
- Nsight Compute Range Replay mode usage
- Nsight Compute can not non-interactive Profiler in Windows
- How do I analyze register spills with Nsight Compute?
- use NCU with tensorRT, but got No kernels were profiled
- CUDA math function register usage
- Roofline Model with CUDA Manual vs. Nsight Compute
- Unbalanced Memory Read & Write in CUDA
- L2 Fabric cache hit rate of CUDA kernels on A100
- With the NSight Compute profiler, can I check cache hit rates for a specific region of memory?
- Why is the Compute Throughput’s value different from the actual Performance / Peak Performance?
- Can I skip ahead to profile a specific invocation of a specific kernel?
- ncu-ui won't run: Could not load the Qt platform plugin "xcb" in "" even though it was found
- Nsight Compute says: "Profiling is not supported on this device" - why?
- Filter on partial kernel name with Nsight Compute
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Popular Tags
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
The very short answer is apparently that NSight Compute currently doesn’t show local memory spills.
However:
-Xptxas=“-v”i.e. turn on verbose output from the assembler.cuFuncGetAttributeAPI with theCU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTESattribute if you have a handle to the functionlocal_size_bytesattribute which is automagically populated after compilation.[answer assembled from comments and added as a community wiki entry]