I am getting a strange result using global variables. This question was inspired by another question. In the code below if I change
int ncols = 4096;
to
static int ncols = 4096;
or
const int ncols = 4096;
the code runs much faster and the assembly is much simpler.
//c99 -O3 -Wall -fopenmp foo.c
#include <stdlib.h>
#include <stdio.h>
#include <omp.h>
int nrows = 4096;
int ncols = 4096;
//static int ncols = 4096;
char* buff;
void func(char* pbuff, int * _nrows, int * _ncols) {
for (int i=0; i<*_nrows; i++) {
for (int j=0; j<*_ncols; j++) {
*pbuff += 1;
pbuff++;
}
}
}
int main(void) {
buff = calloc(ncols*nrows, sizeof*buff);
double dtime = -omp_get_wtime();
for(int k=0; k<100; k++) func(buff, &nrows, &ncols);
dtime += omp_get_wtime();
printf("time %.16e\n", dtime/100);
return 0;
}
I also get the same result if char* buff
is a automatic variable (i.e. not global
or static
). I mean:
//c99 -O3 -Wall -fopenmp foo.c
#include <stdlib.h>
#include <stdio.h>
#include <omp.h>
int nrows = 4096;
int ncols = 4096;
void func(char* pbuff, int * _nrows, int * _ncols) {
for (int i=0; i<*_nrows; i++) {
for (int j=0; j<*_ncols; j++) {
*pbuff += 1;
pbuff++;
}
}
}
int main(void) {
char* buff = calloc(ncols*nrows, sizeof*buff);
double dtime = -omp_get_wtime();
for(int k=0; k<100; k++) func(buff, &nrows, &ncols);
dtime += omp_get_wtime();
printf("time %.16e\n", dtime/100);
return 0;
}
If I change buff
to be a short pointer then the performance is fast and does not depend on if ncols
is static or constant of if buff
is automatic. However, when I make buff
an int*
pointer I observe the same effect as char*
.
I thought this may be due to pointer aliasing so I also tried
void func(int * restrict pbuff, int * restrict _nrows, int * restirct _ncols)
but it made no difference.
Here are my questions
- When
buff
is either achar*
pointer or aint*
global pointer why is the code faster whenncols
has file scope or is constant? - Why does
buff
being an automatic variable instead of global or static make the code faster? - Why does it make no difference when
buff
is a short pointer? - If this is due to pointer aliasing why does
restrict
have no noticeable effect?
Note that I'm using omp_get_wtime()
simply because it's convenient for timing.
Some elements allow, as it's been written, GCC to assume different behaviors in terms of optimization; likely, the most impacting optimization we see is loop vectorization. Therefore,
The code is faster because the hot part of it, the loops in
func
, have been optimized with auto-vectorization. In the case of a qualifiedncols
withstatic
/const
, indeed, GCC emits:which is visible if you turn on
-fopt-info-loop
,-fopt-info-vec
or combinations of those with a further-optimized
since it has the same effect.In this case, GCC is able to compute the number of iterations which is intuitively necessary to apply vectorization. This is again due to the storage of
buf
which is external if not specified otherwise. The whole vectorization is immediately skipped, unlike whenbuff
is local where it carries on and succeeds.Why should it?
func
accepts achar*
which may alias anything.I don't think because GCC can see that they don't alias when
func
is invoked:restrict
isn't needed.