Can I put multiple ordered statements in one ordered for loop (OpenMP)?

621 views Asked by At

I just found out that while this C code gives an ordered list of integers (as expected):

#include <stdio.h>
#include <unistd.h>
#include <omp.h>

int main() {
#pragma omp parallel for ordered schedule(dynamic)
  for (int i=0; i<10; i++) {
#pragma omp ordered
    {
    printf("%i             (tid=%i)\n",i,omp_get_thread_num(); fflush(stdout);
    }
  }
}

With both gcc as well as icc, the following gives undesired behaviour:

#include <stdio.h>
#include <unistd.h>
#include <omp.h>

int main() {
#pragma omp parallel for ordered schedule(dynamic)
  for (int i=0; i<10; i++) {
#pragma omp ordered
    {
    printf("%i             (tid=%i)\n",i,omp_get_thread_num()); fflush(stdout);
    }

    usleep(100*omp_get_thread_num());
    printf("WORK IS DONE  (tid=%i)\n",omp_get_thread_num()); fflush(stdout);
    usleep(100*omp_get_thread_num());

#pragma omp ordered
    {
    printf("  %i           (tid=%i)\n",i,omp_get_thread_num()); fflush(stdout);
    }
  }
} 

What I'd love to see is:
0
1
2
3
4
5
6
7
8
9
WORK IS DONE
WORK IS DONE
WORK IS DONE
WORK IS DONE
WORK IS DONE
WORK IS DONE
WORK IS DONE
WORK IS DONE
WORK IS DONE
WORK IS DONE
0
1
2
3
4
5
6
7
8
9

But with gcc is get:
0 (tid=5)
WORK IS DONE (tid=5)
0 (tid=5)
1 (tid=2)
WORK IS DONE (tid=2)
1 (tid=2)
2 (tid=0)
WORK IS DONE (tid=0)
2 (tid=0)
3 (tid=6)
WORK IS DONE (tid=6)
3 (tid=6)
4 (tid=7)
WORK IS DONE (tid=7)
4 (tid=7)
5 (tid=3)
WORK IS DONE (tid=3)
5 (tid=3)
6 (tid=4)
WORK IS DONE (tid=4)
6 (tid=4)
7 (tid=1)
WORK IS DONE (tid=1)
7 (tid=1)
8 (tid=5)
WORK IS DONE (tid=5)
8 (tid=5)
9 (tid=2)
WORK IS DONE (tid=2)
9 (tid=2)
(so everything get's ordered - even the parallelizable work part)

And with icc:
1 (tid=0)
2 (tid=5)
3 (tid=1)
4 (tid=2)
WORK IS DONE (tid=1)
WORK IS DONE (tid=3)
3 (tid=1)
6 (tid=4)
7 (tid=7)
8 (tid=1)
WORK IS DONE (tid=0)
5 (tid=6)
WORK IS DONE (tid=2)
1 (tid=0)
9 (tid=0)
WORK IS DONE (tid=0)
WORK IS DONE (tid=5)
WORK IS DONE (tid=1)
9 (tid=0)
0 (tid=3)
8 (tid=1)
WORK IS DONE (tid=4)
WORK IS DONE (tid=6)
2 (tid=5)
WORK IS DONE (tid=7)
6 (tid=4)
5 (tid=6)
4 (tid=2)
7 (tid=7)
(so nothing get's ordered not even the ordered clauses)

Is using multiple ordered clauses within one ordered loop undefined behaviour or what is going on here? I couldn't find anything disallowing multiple clauses per loop in any of the OpenMP documentations I could find.

I know that in this trivial example I could just part the loops like

int main() {  
  for (int i=0; i<10; i++) {  
    printf("%i             (tid=%i)\n",i,omp_get_thread_num()); fflush(stdout);  
  }  
#pragma omp parallel for schedule(dynamic)  
  for (int i=0; i<10; i++) {  
    usleep(100*omp_get_thread_num());  
    printf("WORK IS DONE  (tid=%i)\n",omp_get_thread_num()); fflush(stdout);  
    usleep(100*omp_get_thread_num());  
  }  
  for (int i=0; i<10; i++) {  
    printf("  %i           (tid=%i)\n",i,omp_get_thread_num()); fflush(stdout);  
  }          
}  

So I'm not looking for a workaround. I really want to understand what is going on here, so that I can handle the real situation without running into anything devastating/unexpected.

I really hope you can help me.

2

There are 2 answers

0
skol On BEST ANSWER

According to OpenMP 4.0 API specifications you can't.

Only one ordered clause can appear on a loop directive (p. 58)

4
AudioBubble On

I am a little new in parallel programming, but I will try to help you.

I have modified your code and tested this one:

#include <stdio.h>
#include <unistd.h>
#include <omp.h>

int main() {

  #pragma omp parallel num_threads(8)
  {

    #pragma omp for ordered schedule(dynamic)
    for (int i=0; i<10; i++) {

          #pragma omp ordered
          printf("%i (tid=%i) \n",i,omp_get_thread_num()); fflush(stdout);

    }

    printf("WORK IS DONE  (tid=%i)\n",omp_get_thread_num()); fflush(stdout);

  }


}

Adapt the number of threads to the machine you are using to compile your examples.The problem in your code is that the access to the printf indicating that work is done is being done randomly, every thread will execute this part independently. In my example, I let the iterations of the for loop be executed as the ordered clause states, and then the for's clause implicit barrier keeps every thread waiting until all of them have reached the position of code right after both the for loop and the for clause, and then each one prints out "work is done". If you are not using a for clause and you want to get the same output, you can use an explicit barrier or, in other, words, #pragma omp barrier.

Note: "pragma omp parallel" does also use an implicit barrier, after which every thread that has been created is destroyed

Here is a possible output I obtained:

0 (tid=7) 
1 (tid=5) 
2 (tid=0) 
3 (tid=4) 
4 (tid=1) 
5 (tid=3) 
6 (tid=2) 
7 (tid=7) 
8 (tid=5) 
9 (tid=0) 

WORK IS DONE  (tid=5)
WORK IS DONE  (tid=2)
WORK IS DONE  (tid=1)
WORK IS DONE  (tid=4)
WORK IS DONE  (tid=0)
WORK IS DONE  (tid=7)
WORK IS DONE  (tid=3)
WORK IS DONE  (tid=6)

If this is the kind of output you would like to see, this is a possible way of achieving it. Hope this helps, and do not hesitate to ask for further help if necessary. Keep coding!