Parallel for loop with reduction and manipulating arrays

165 views Asked by At

I'm new to openMP and I try to optimize for loop. The result is not as expected, the for loops are not working correctly (due to dependency). I don't understand how to get a perfect parallel loop with the examples below :

    #pragma omp parallel for default(shared) reduction(+...)
    for(i = rest - 1; i >= 0; i--) {
        scounts[i] += N;
    }

    #pragma omp parallel for private(i)
    for(i = 1; i < p; i++) {
        disp[i] = disp[i-1] + scounts[i-1];
    }

I tried these 2 pragma directives without any succes. What is the best way to proceed in these cases ?

2

There are 2 answers

0
Z boson On BEST ANSWER

You have already picked a hard problem to do in parallel. In general when writing an array you don't want elements of the array to depend on previous elements which is exactly what you have in your second loop.

Most people give up when they see a dependency. But these are the interesting cases which require a bit of thinking. In your case you second loop is equivalent to

type sum = 0; //replace type with int, float, double...
for(i = 1; i < p; i++) {
    sum += scounts[i-1];
    disp[i] = disp[0] + sum;
}

This is a cumulative sum (aka prefix sum). OpenMP does not provide easy constructs to do the prefix sum. You have to do it in two passes. Here is how you do it (I assumed the type of disp and scounts is int but you can replace it with float or whatever):

int *suma;
#pragma omp parallel
{
    int ithread = omp_get_thread_num();
    int nthreads = omp_get_num_threads();
    #pragma omp single
    {
        suma = malloc(nthreads * sizeof *suma);
        suma[0] = 0;
    }
    int sum = 0;
    #pragma omp for schedule(static)
    for (int i=1; i<p; i++) {
        sum += scounts[i-1];
        disp[i] = disp[0] + sum;
    }
    suma[omp_get_thread_num()+1] = sum;
    #pragma omp barrier
    int offset = 0;
    for(int i=0; i<(ithread+1); i++) {
        offset += suma[i];
    }
    #pragma omp for schedule(static)
    for(int i=1; i<p; i++) {
        disp[i] += offset;
    }
}
free(suma);

But if you're just learning OpenMP I suggest you start with an easier case first.

2
dlask On

Please use #pragma directly:

#pragma omp parallel ...

instead of #pragma in comment:

// #pragma omp parallel ...