openmp nested parallelism and num_threads(1)

72 views Asked by At

So I just, after an inappropriate amount of time, found out, that even if you have nested OpenMP disabled, the inner parallel region in the following sample will still run in parallel:

#pragma omp parallel num_threads(1)
{
    printf("BBB thread %d/%d level %d %d\n", omp_get_thread_num(), omp_get_num_threads(), omp_get_active_level(), omp_get_level());

    #pragma omp parallel
    {
        printf("CCC thread %d/%d level %d %d\n", omp_get_thread_num(), omp_get_num_threads(), omp_get_active_level(), omp_get_level());
    }
}

Yes, I did set num_threads to 1, but it still is a parallel region. Why does it not behave as one in terms of nested OpenMP? Why is (an equivalent of) omp_get_active_level() used to determine the nest-ism, instead of omp_get_level()? It just does not make sense to me. Why num_threads(3) behaves analogously to num_threads(2), but num_threads(1) behaves differently?

Is this behavior expected? I tested with g++ and icpx compilers and both work in the same way.

if(false) has the same effect as num_threads(1), but that's is expectable, since with this you actually specify that you don't want to launch a parallel region. But it still affects omp_get_level(), which seems weird.

I did read this algorithm, so this is more of a question of why is it designed in such a way?

btw this is the output I am getting, when OMP_NUM_THREADS=4 (AAA is completely outside any parallel region):

AAA thread 0/1 level 0 0

num_threads(3):
BBB thread 1/3 level 1 1
CCC thread 0/1 level 1 2
BBB thread 0/3 level 1 1
CCC thread 0/1 level 1 2
BBB thread 2/3 level 1 1
CCC thread 0/1 level 1 2

num_threads(2):
BBB thread 1/2 level 1 1
CCC thread 0/1 level 1 2
BBB thread 0/2 level 1 1
CCC thread 0/1 level 1 2

num_threads(1):
BBB thread 0/1 level 0 1
CCC thread 0/4 level 1 2
CCC thread 1/4 level 1 2
CCC thread 2/4 level 1 2
CCC thread 3/4 level 1 2

and the full program:

#include <cstdio>
#include <omp.h>

int main()
{
    printf("AAA thread %d/%d level %d %d\n", omp_get_thread_num(), omp_get_num_threads(), omp_get_active_level(), omp_get_level());

    printf("\nnum_threads(3):\n");
    #pragma omp parallel num_threads(3)
    {
        printf("BBB thread %d/%d level %d %d\n", omp_get_thread_num(), omp_get_num_threads(), omp_get_active_level(), omp_get_level());

        #pragma omp parallel
        {
            printf("CCC thread %d/%d level %d %d\n", omp_get_thread_num(), omp_get_num_threads(), omp_get_active_level(), omp_get_level());
        }
    }

    printf("\nnum_threads(2):\n");
    #pragma omp parallel num_threads(2)
    {
        printf("BBB thread %d/%d level %d %d\n", omp_get_thread_num(), omp_get_num_threads(), omp_get_active_level(), omp_get_level());

        #pragma omp parallel
        {
            printf("CCC thread %d/%d level %d %d\n", omp_get_thread_num(), omp_get_num_threads(), omp_get_active_level(), omp_get_level());
        }
    }

    printf("\nnum_threads(1):\n");
    #pragma omp parallel num_threads(1)
    {
        printf("BBB thread %d/%d level %d %d\n", omp_get_thread_num(), omp_get_num_threads(), omp_get_active_level(), omp_get_level());

        #pragma omp parallel
        {
            printf("CCC thread %d/%d level %d %d\n", omp_get_thread_num(), omp_get_num_threads(), omp_get_active_level(), omp_get_level());
        }
    }

    return 0;
}
1

There are 1 answers

0
Joachim On

Disabling nesting is equivalent to setting max-active-levels-var to 1 - either using the environmental variable (OMP_MAX_ACTIVE_LEVELS=1) or using the runtime function (omp_set_max_active_levels(1)). A parallel region executing with a single thread is defined as inactive parallel region. Therefore such parallel region does not count towards the max active regions limit. As other comments suggested, the num_threads clause should only be used when really necessary. The more flexible way is to export OMP_NUM_THREADS=1,4 to get the output for your last experiement.