I'm new to openMP and I try to optimize for loop. The result is not as expected, the for loops are not working correctly (due to dependency). I don't understand how to get a perfect parallel loop with the examples below :
#pragma omp parallel for default(shared) reduction(+...)
for(i = rest - 1; i >= 0; i--) {
scounts[i] += N;
}
#pragma omp parallel for private(i)
for(i = 1; i < p; i++) {
disp[i] = disp[i-1] + scounts[i-1];
}
I tried these 2 pragma directives without any succes. What is the best way to proceed in these cases ?
You have already picked a hard problem to do in parallel. In general when writing an array you don't want elements of the array to depend on previous elements which is exactly what you have in your second loop.
Most people give up when they see a dependency. But these are the interesting cases which require a bit of thinking. In your case you second loop is equivalent to
This is a cumulative sum (aka prefix sum). OpenMP does not provide easy constructs to do the prefix sum. You have to do it in two passes. Here is how you do it (I assumed the type of
disp
andscounts
isint
but you can replace it withfloat
or whatever):But if you're just learning OpenMP I suggest you start with an easier case first.