Bank Conflict Issue in CUDA Shared Memory Access

56 views Asked by Sunjnn At 28 March 2024 at 02:21

I'm working on the render part of Assignment 2 for CMU's 15-418 course,which involves writing a high-performance renderer using CUDA. In my code, each CUDA thread is responsible for computing a single pixel, which is a data-parallel task.

To accelerate the computation, I'm using shared memory to store intermediate results:

__shared__ float smem_rads[32];
__shared__ float smem_colors[32 * 3];
__shared__ float smem_ps[32 * 3];

// Load values into the shared memory arrays every 32 indices

for (int index = 0; index < cuConstRendererParams.numCircles; ++index) {
    int index3 = 3 * index;
    if (index % 32 == 0) {
        __syncthreads();
        if (threadId < 32) {
            smem_ps[threadId] = cuConstRendererParams.position[index3 + threadId];
            smem_ps[threadId + 32] = cuConstRendererParams.position[index3 + threadId + 32];
            smem_ps[threadId + 64] = cuConstRendererParams.position[index3 + threadId + 64];
            smem_rads[threadId] = cuConstRendererParams.radius[index + threadId];
            smem_colors[threadId] = cuConstRendererParams.color[index3 + threadId];
            smem_colors[threadId + 32] = cuConstRendererParams.color[index3 + threadId + 32];
            smem_colors[threadId + 64] = cuConstRendererParams.color[index3 + threadId + 64];
        }
        __syncthreads();
    }

    // Access the shared memory values later
    float3 p = ((float3*)smem_ps)[index % 32];
    float rad = smem_rads[index % 32];
    float rgb = ((float3*)smem_colors)[index % 32];
}

However, when analyzing the code using Nsight Compute, I'm encountering bank conflicts.

Could someone please explain the reason for these bank conflicts and suggest ways to resolve them? I'd greatly appreciate any insights or recommendations to optimize my shared memory access patterns.

Other Relevant Information:

CUDA version: 12.2
GPU model: NVIDIA RTX 3090

I try to change 32 to other number, but it does not work.

EDIT: Thank you for all your assistance. I have used __ldg instructions for loading const global memory, and the bank conflicts have disappeared. However, it appears that bank conflicts were not the primary factor impacting performance.

Original Q&A

TechQA.

Bank Conflict Issue in CUDA Shared Memory Access

There are 0 answers

Related Questions in CUDA

Related Questions in GPU-SHARED-MEMORY

Related Questions in BANK-CONFLICT

Related Questions in NSIGHT-COMPUTE

Popular Questions

Trending Questions