TechQA.

Question

load value into upper/lower part of a register

score 14 · Answer 1 · 2024-03-26T13:51:55.177000

0

Answer

14

Views

load value into upper/lower part of a register

14 views Asked by SzymonO At 26 March 2024 at 13:51

score 75 · Answer 2 · 2024-03-25T16:19:48.440000

Why compile to cubin and not just to PTX?

75 views Asked by Elena Yudovina At 25 March 2024 at 16:19

score 68 · Answer 3 · 2024-03-14T16:22:26.083000

Why the distinction between WMMA and "just" MMA instructions?

68 views Asked by einpoklum At 14 March 2024 at 16:22

score 109 · Answer 4 · 2024-03-12T12:01:59.037000

Does PTX (8.4) not cover smaller-shape WMMA instructions?

109 views Asked by einpoklum At 12 March 2024 at 12:01

score 161 · Answer 5 · 2023-12-13T19:38:34.507000

Can I force certain computations to occur despite their result not being used in the kernel?

161 views Asked by einpoklum At 13 December 2023 at 19:38

score 103 · Answer 6 · 2023-11-05T04:26:46.517000

Functions called by an Input in CUDA

103 views Asked by Tejas Gupta At 05 November 2023 at 04:26

score 178 · Answer 7 · 2023-09-28T00:19:29.357000

CUDA __shfl_down_sync does not work with __match_any_sync

178 views Asked by SnowSR At 28 September 2023 at 00:19

score 52 · Answer 8 · 2023-09-02T23:46:32.533000

Using NVCC-generated PTX file in OpenCL

52 views Asked by MCx At 02 September 2023 at 23:46

score 122 · Answer 9 · 2023-08-31T10:24:30.053000

The meaning of brackets around register in PTX assembly loads/stores

122 views Asked by Dmitry Mikushin At 31 August 2023 at 10:24

score 293 · Answer 10 · 2023-08-28T12:47:41.560000

Confusion about __cvta_generic_to_shared

293 views Asked by foreverrookie At 28 August 2023 at 12:47

score 195 · Answer 11 · 2023-01-17T07:06:56.767000

How to get instruction cost in NVIDIA GPU？

195 views Asked by sorfkc At 17 January 2023 at 07:06

score 71 · Answer 12 · 2022-11-30T13:54:57.103000

How to compare AT&T-assembly-like sources (e.g. CUDA PTX)?

71 views Asked by einpoklum At 30 November 2022 at 13:54

score 225 · Answer 13 · 2022-10-28T10:12:23.247000

Linking error when using NVIDIA's static PTX compiler library & -lpthreads

225 views Asked by einpoklum At 28 October 2022 at 10:12

score 427 · Answer 14 · 2022-10-13T20:31:27.653000

Can I hint to CUDA that it should move a given variable into the L1 cache?

427 views Asked by emchristiansen At 13 October 2022 at 20:31

score 47 · Answer 15 · 2022-08-27T09:35:34.940000

What does --entry take in CUDA's PTX JIT compiler?

47 views Asked by einpoklum At 27 August 2022 at 09:35

score 214 · Answer 16 · 2022-07-20T18:20:15.050000

Is it bad that NVCC generates PTX code that is very generous with registers?

214 views Asked by Niels Slotboom At 20 July 2022 at 18:20

score 7496 · Answer 17 · 2022-07-17T15:26:10.387000

PyTorch CUDA : the provided PTX was compiled with an unsupported toolchain

7.4k views Asked by Prakhar Sharma At 17 July 2022 at 15:26

score 312 · Answer 18 · 2022-06-11T10:48:32.143000

Are load and store operations in shared memory atomic?

312 views Asked by Pierre T. At 11 June 2022 at 10:48

score 48 · Answer 19 · 2022-04-04T19:58:04.817000

Can nvcc generate an older PTX ISA version

48 views Asked by Steve Cox At 04 April 2022 at 19:58

score 1778 · Answer 20 · 2022-03-03T06:46:07.630000

Why Pytorch 1.7 with cuda10.1 cannot compatible with Nvidia A100 Ampere Architecture (according to PTX compatibilty pricinple)

1.7k views Asked by Seven link bob At 03 March 2022 at 06:46

TechQA.

List Question

load value into upper/lower part of a register

Why compile to cubin and not just to PTX?

Why the distinction between WMMA and "just" MMA instructions?

Does PTX (8.4) not cover smaller-shape WMMA instructions?

Can I force certain computations to occur despite their result not being used in the kernel?

Functions called by an Input in CUDA

CUDA __shfl_down_sync does not work with __match_any_sync

Using NVCC-generated PTX file in OpenCL

The meaning of brackets around register in PTX assembly loads/stores

Confusion about __cvta_generic_to_shared

How to get instruction cost in NVIDIA GPU？

How to compare AT&T-assembly-like sources (e.g. CUDA PTX)?

Linking error when using NVIDIA's static PTX compiler library & -lpthreads

Can I hint to CUDA that it should move a given variable into the L1 cache?

What does --entry take in CUDA's PTX JIT compiler?

Is it bad that NVCC generates PTX code that is very generous with registers?

PyTorch CUDA : the provided PTX was compiled with an unsupported toolchain

Are load and store operations in shared memory atomic?

Can nvcc generate an older PTX ISA version

Why Pytorch 1.7 with cuda10.1 cannot compatible with Nvidia A100 Ampere Architecture (according to PTX compatibilty pricinple)

Popular Questions

Trending Questions