List Question
20 TechQA 2024-03-26T13:51:55.177000load value into upper/lower part of a register
14 views
Asked by SzymonO
Why compile to cubin and not just to PTX?
75 views
Asked by Elena Yudovina
Why the distinction between WMMA and "just" MMA instructions?
68 views
Asked by einpoklum
Does PTX (8.4) not cover smaller-shape WMMA instructions?
109 views
Asked by einpoklum
Can I force certain computations to occur despite their result not being used in the kernel?
161 views
Asked by einpoklum
Functions called by an Input in CUDA
103 views
Asked by Tejas Gupta
CUDA __shfl_down_sync does not work with __match_any_sync
178 views
Asked by SnowSR
Using NVCC-generated PTX file in OpenCL
52 views
Asked by MCx
The meaning of brackets around register in PTX assembly loads/stores
122 views
Asked by Dmitry Mikushin
Confusion about __cvta_generic_to_shared
293 views
Asked by foreverrookie
How to get instruction cost in NVIDIA GPU?
195 views
Asked by sorfkc
How to compare AT&T-assembly-like sources (e.g. CUDA PTX)?
71 views
Asked by einpoklum
Linking error when using NVIDIA's static PTX compiler library & -lpthreads
225 views
Asked by einpoklum
Can I hint to CUDA that it should move a given variable into the L1 cache?
427 views
Asked by emchristiansen
What does --entry take in CUDA's PTX JIT compiler?
47 views
Asked by einpoklum
Is it bad that NVCC generates PTX code that is very generous with registers?
214 views
Asked by Niels Slotboom
PyTorch CUDA : the provided PTX was compiled with an unsupported toolchain
7.4k views
Asked by Prakhar Sharma
Are load and store operations in shared memory atomic?
312 views
Asked by Pierre T.
Can nvcc generate an older PTX ISA version
48 views
Asked by Steve Cox
Why Pytorch 1.7 with cuda10.1 cannot compatible with Nvidia A100 Ampere Architecture (according to PTX compatibilty pricinple)
1.7k views
Asked by Seven link bob