Bank Conflict Issue in CUDA Shared Memory Access

56 views Asked by At

I'm working on the render part of Assignment 2 for CMU's 15-418 course,which involves writing a high-performance renderer using CUDA. In my code, each CUDA thread is responsible for computing a single pixel, which is a data-parallel task.

To accelerate the computation, I'm using shared memory to store intermediate results:

__shared__ float smem_rads[32];
__shared__ float smem_colors[32 * 3];
__shared__ float smem_ps[32 * 3];

// Load values into the shared memory arrays every 32 indices

for (int index = 0; index < cuConstRendererParams.numCircles; ++index) {
    int index3 = 3 * index;
    if (index % 32 == 0) {
        __syncthreads();
        if (threadId < 32) {
            smem_ps[threadId] = cuConstRendererParams.position[index3 + threadId];
            smem_ps[threadId + 32] = cuConstRendererParams.position[index3 + threadId + 32];
            smem_ps[threadId + 64] = cuConstRendererParams.position[index3 + threadId + 64];
            smem_rads[threadId] = cuConstRendererParams.radius[index + threadId];
            smem_colors[threadId] = cuConstRendererParams.color[index3 + threadId];
            smem_colors[threadId + 32] = cuConstRendererParams.color[index3 + threadId + 32];
            smem_colors[threadId + 64] = cuConstRendererParams.color[index3 + threadId + 64];
        }
        __syncthreads();
    }

    // Access the shared memory values later
    float3 p = ((float3*)smem_ps)[index % 32];
    float rad = smem_rads[index % 32];
    float rgb = ((float3*)smem_colors)[index % 32];
}

However, when analyzing the code using Nsight Compute, I'm encountering bank conflicts. bank conflict showed by Nsight Compute

Could someone please explain the reason for these bank conflicts and suggest ways to resolve them? I'd greatly appreciate any insights or recommendations to optimize my shared memory access patterns.

Other Relevant Information:

  • CUDA version: 12.2
  • GPU model: NVIDIA RTX 3090

I try to change 32 to other number, but it does not work.

EDIT: Thank you for all your assistance. I have used __ldg instructions for loading const global memory, and the bank conflicts have disappeared. However, it appears that bank conflicts were not the primary factor impacting performance.

0

There are 0 answers