Same class, 2 programs, different OpenMP speedups; MSVC2017

26 views Asked by At

I have a C++ class, several of whose functions have OpenMP parallel for loops. I'm building it into two apps with MSVC2017, and find that one of those functions runs differently in the 2 apps. The function has two separate parallel for loops. In one build, the VS debugger shows them both using 7 cores for a solid second while processing a block of test data; in the other, it shows just two blips of multicore usage, presumably at the beginning of each parallel section, but only 1 processor runs most of the time.

These functions are deep inside the code for the class, which is identical in the 2 apps. The builds have the same compiler and linker options so far as I can see. I generate the projects with CMake and never modify them by hand.

Can anyone suggest possible reasons for this behavior? I am fully aware of other ways to parallelize code, so please don't tell me about those. I am just looking for expertise on OpenMP under MSVC.

1

There are 1 answers

1
Jim Cownie On

I expect he two calls are passing in significantly different amounts of work. Consider (example, trivial, typed into this post, not compiled, not the way to write this!) code like

void scale(int n, double *d, double f) { 
#pragma omp parallel for
    for (int i=0; i<n; i++)
        d[i] = d[i] * f;
}

If invoked with a large vector where n == 10000, you'll get some parallelism and many threads working. If called with n == 3 there's obviously only work for three threads! If you use #pragma omp parallel for schedule(dynamic) it's quite possible that even with ten or twenty iterations a single thread will execute most of them.

In summary: context matters.