std::execution::par does not spawn threads

77 views Asked by At

I am trying to parallelize the processing of problems contained in a vector. To do so I would like to first try to avoid std::thread, given that the std provides parallel execution methods which should cover my use case. My understanding of how to apply it is this:

std::vector<SolutionT> solution_vector;
std::mutex mut;
std::for_each(std::execution::par, problems.begin(), problems.end(), [&](const auto& problem) {
   auto solution = do_heavy_work(problem)
   std::lock_guard guard(mut);
   solution_vector.emplace_back(solution);
}

In CMake I also add the threads lib to the executable (although unsure if necessary?):

find_package(Threads REQUIRED)
target_link_libraries(
        executable 
        PRIVATE 
        Threads::Threads
)

The solving time for a problem ranges from mere milliseconds to tens of seconds and since I have a thousand problems in my vector and most are of the larger kind, so I am expecting parallelism to speed up this processing time a fair bit.

Yet, when I run the code I do not observe any threads being spawned. I check the processes with htop and observe only a single core being pushed to 100% while the rest are idle.

Am I wrong in expecting threads to be spawned in combination with std::execution::par? Am I missing a step to get this right?

This was tested on an 18-core Intel x86_64 Ubuntu 22.04 platform with GCC 11.4 and c++20.

Edit: Here is a godbolt example that reflects my usage: https://godbolt.org/z/EEMEWr3Wa

1

There are 1 answers

0
Damir Tenishev On

According to execution_policy

If the implementation cannot parallelize or vectorize (e.g. due to lack of resources), all standard execution policies can fall back to sequential execution.

See the lines of the generated code from your link (lines 228 of the generated code and further).

movaps XMMWORD PTR [rsp+0x30],xmm0
 movdqa xmm0,XMMWORD PTR [rip+0x0]        # 230 <heavy_work[abi:cxx11](std::shared_ptr<int> const&)+0x1c0>
    R_X86_64_PC32 .LC5-0x4

Instead of spawning threads it does vectorization.

It could be another good question why vectorization applied with std::execution::par.

The recommended reading here might include this paper.