CUDA: Memory performance, What is Global memory bandwidth

5.1k views Asked by At

I am learning about CUDA optimizations. I found a presentation on this link: Optimizing CUDA by Paulius Micikevicius.

In this presentation, they talk about

MAXIMIZE GLOBAL MEMORY BANDWIDTH

, they say global memory coalescing will improve the bandwidth.

My question, How do you calculate the Global Memory Bandwidth. Can anyone explain me with a simple program example.

1

There are 1 answers

3
Yappie On BEST ANSWER

Theoretical bandwidth can be calculated using hardware spec.

For example, the NVIDIA GeForce GTX 280 uses DDR RAM with a memory clock rate of 1,107 MHz and a 512-bit wide memory interface. Using these data items, the peak theoretical memory bandwidth of the NVIDIA GeForce GTX 280 is 141.6 GB/sec:

enter image description here

In this calculation, the memory clock rate is converted in to Hz, multiplied by the interface width (divided by 8, to convert bits to bytes) and multiplied by 2 due to the double data rate. Finally, this product is divided by 10^9 to convert the result to GB/sec (GBps).

Effective bandwidth is calculated by timing specific program activities and by knowing how data is accessed by the program. To do so, use this equation:

Effective bandwidth = (( Br + Bw ) / 10^9 ) / time

Here, the effective bandwidth is in units of GBps, Br is the number of bytes read per kernel, Bw is the number of bytes written per kernel, and time is given in seconds.

More information is available in CUDA best practice guide.