Xeon Phi: Impossible to achieve perfect memory coalescing and fully utilize SMID units?

121 views Asked by AstrOne At 28 December 2016 at 18:07

I have a GPU/CUDA code that processes a cube (3D image, a spectral cube to be precise). Think of the cube as a series of images/slices, or alternatively, a bunch of spectra with different spatial locations (on a square). Each pixel of an image has different x, y values and the same z. Each pixel on a spectrum has the same x,y but varying z. The memory of the cube is aligned in a way so that two consecutive memory addresses correspond to x and x+1.

In my CUDA code I configured each CUDA thread to process a different spectrum. This way I can achieve global memory coalescing. Then I ported this code to Intel Phi (#pragma offload+OpenMP). Like in the GPU case, I have the each Phi core to process a different spectrum. As a result memory coalescing is achieved here as well. However, the performance is bad.

I assume the problem is that although I have coalescing with the global memory, the pixels across each spectrum are not on consecutive memory addresses and as a result, Phi's vectorization does not provide any performance improvement. (Remember, each core does some kind of reduction across the associated spectrum; to be more precise, it calculates the 1st, 2nd, and 3rd moments). Does this thought make sense?
If I am not mistaken in order to gain performance from SIMD your memory addresses has to be contiguous, right?
So it seems that on Xeon phi is impossible to achieve perfect memory coalescing global memory and at the same time take full advantage of the SIMD. Does this make sense or I am totally wrong?

PS: I am using Intel Xeon Phi 7120

Original Q&A

TechQA.

Xeon Phi: Impossible to achieve perfect memory coalescing and fully utilize SMID units?

There are 0 answers

Related Questions in SIMD

Related Questions in INTEL-MIC

Related Questions in COALESCING

Popular Questions

Popular Tags

Trending Questions