Why in C++ overwritingis is slower than writing?

131 views Asked by At

USELESS QUESTION - ASKED TO BE DELETED I have to run a piece of code that manages a video stream from camera. I am trying to boost it, and I realized a weird C++ behaviour. (I have to admit I am realizing I do not know C++)

The first piece of code run faster than the seconds, why? It might be possible that the stack is almost full?

Faster version

double* temp = new double[N];
for(int i = 0; i < N; i++){
    temp[i] = operation(x[i],y[i]);
    res = res + (temp[i]*temp[i])*coeff[i];
} 

Slower version1

double temp;
for(int i = 0; i < N; i++){
    temp = operation(x[i],y[i]);
    res = res + (temp*temp)*coeff[i];
}

Slower version2

for(int i = 0; i < N; i++){
    double temp = operation(x[i],y[i]);
    res = res + (temp*temp)*coeff[i];
} 

EDIT I realized the compiler was optimizing the product between elemnts of coeff and temp. I beg your pardon for the unuseful question. I will delete this post.

1

There are 1 answers

0
valdo On BEST ANSWER

This has obviously nothing to do with "writing vs overwriting".

Assuming your results are indeed correct, I can guess that your "faster" version can be vectorized (i.e. pipelined) by the compiler more efficiently.

The difference in that in this version you allocate a storage space for temp, whereas each iteration uses its own member of the array, hence all the iterations can be executed independently.

Your "slow 1" version creates a (sort of) false dependence on a single temp variable. A primitive compiler might "buy" it, producing a non-pipelined code.

Your "slow 2" version seems to be ok actually, loop iterations are independent. Why is this still slower? I can guess that this is due to the use of the same CPU registers. That is, arithmetic on double is usually implemented via FPU stack registers, this is the interference between loop iterations.