For some reason serial code runs faster than SIMD code

Question

For some reason serial code runs faster than SIMD code

219 views Asked by Tracy Maxen At 11 June 2015 at 19:04

For some reason running the simple serial code

for(i=0;i<1152*1152;i++){
    MatrixA3[i] = MatrixA1[i] + z*MatrixA2[i];}

runs faster than or same speed with the vectorized equivalent;

for (int i = 0; i < 1152*1152; i+=4){
    load_data1 = _mm256_load_pd(MatrixA1 + i);
    load_data2 = _mm256_load_pd(MatrixA2 + i);
    _mm256_store_pd(MatrixA3 + i, _mm256_fmadd_pd(load_z,
    load_data2,load_data1_dp));
    }

On my intel i7-4578U with Intel compiler XE 15.0, the former runs in 1.507millesecs while the later finished in 1.513millisecs with 10000runs.

My experience has been a significant acceleration with avx2 intrinsics but for some reason this line decides to fail me. What am I doing wrong please?

Original Q&A

There are 1 answers

**RamblingMad** · Answer 1 · 2015-06-11T19:19:56+00:00

RamblingMad On 11 June 2015 at 19:19

What are you doing wrong? Not trusting your compiler.

This is not a case for manual optimization, any respectable compiler could vectorize that.

TechQA.

For some reason serial code runs faster than SIMD code

There are 1 answers

Related Questions in C++

Related Questions in AVX

Related Questions in AVX2

Popular Questions

Popular Tags

Trending Questions