If each piece of data take 128 bit or more, is there any advantage of grouping them in memory?

Question

If each piece of data take 128 bit or more, is there any advantage of grouping them in memory?

166 views Asked by subb At 07 August 2013 at 01:07

I've read in the CUDA Programming Guide that the global memory in a CUDA device is accessed by transaction on 32, 64 or 128 bit. Knowing that, is there any advantage of, say, having an set of float4 (128 bit) close together in memory? As I understand it, whether the float4 are distributed in memory or in a sequence, the number of transaction will be the same. Or will all access be coalesced in one gigantic transaction?

Original Q&A

There are 1 answers

**Robert Crovella** · Accepted Answer · 2013-08-07T02:11:45+00:00

Coalescing refers to combining memory requests from individual threads in a warp into a single memory transaction.

A single memory transaction is typically a 128 byte cache line, therefore it would consist of eight 128 bit (e.g. float4) quantities.

So, yes, there is a benefit to having multiple threads requesting adjacent 128 bit quantities, because these can still be coalesced into a single (128 byte) cache line request to memory.

TechQA.

If each piece of data take 128 bit or more, is there any advantage of grouping them in memory?

There are 1 answers

Related Questions in MEMORY

Related Questions in CUDA

Related Questions in COALESCING

Popular Questions

Trending Questions