Last notify_all isn't triggering last conditional_variable.wait

57 views Asked by At

What I'm Trying To Do

Hi, I have two types of threads the main one and the workers where the workers are equal to the number of cores on the CPU, what I'm trying to do is when the main thread needs to call an update I set a boolean called Updating to true and call condition_variable(cv).notify_all then each thread will do its work and when done it will increment by one an atomic_int called CoresCompleted followed by a cv.notify_all so that the main thread can check if all the work is done then it will wait for the variable Updating to be false so it is sure that all other threads finished and it doesn't update again, once everything is done the main thread sets updating to false and notifies all.

CODE

Main

void UpdateManager::Update() {

    //Prepare Update
    CoresCompleted = 0;
    Updating = true;

    //Notify Update Started
    cv.notify_all();

    //Wait for Update to end
    auto Pre = high_resolution_clock::now();
    cv.wait(lk, [] { return (int)UpdateManager::CoresCompleted >= (int)UpdateManager::ProcessorCount; });
    auto Now = high_resolution_clock::now();
    auto UpdateTime = duration_cast<nanoseconds>(Now - Pre);
    
    //End Update and nofity threads
    Updating = false;
    cv.notify_all();
}

Workers

void CoreGroup::Work() {

    Working = true;
    unique_lock<mutex> lk(UpdateManager::m);

    while (Working) {
        
        //Wait For Update To Start
        UpdateManager::cv.wait(lk, []{ return UpdateManager::Updating; });

        if (!Working)
            return;

        //Do Work
        size_t Size = Groups.size();

        auto Pre = high_resolution_clock::now();

        for (size_t Index = 0; Index < Size; Index++)
            Groups[Index]->Update();

        auto Now = high_resolution_clock::now();
        UpdateTime = duration_cast<nanoseconds>(Now - Pre);
        
        //Increment CoresCompleted And Notify All
        UpdateManager::CoresCompleted++;
        UpdateManager::cv.notify_all();

        //Wait For Update To End
        UpdateManager::cv.wait(lk, []{ return !UpdateManager::Updating; });
    }
}

Problem

Once the workers reach the last wait where they wait for Updating to be false they get stuck and never leave, for some reason the last notify_all in the main thread is not reaching the workers, I tried searching and looked for many examples but I can't figure out why it isn't triggering, maybe I miss understood how the cv and lock works, any ideas why this is happening and how to fix?

1

There are 1 answers

2
ixSci On BEST ANSWER

Here is how your code works: some waiting in Update is over when notified:

cv.wait(lk, [] { return (int)UpdateManager::CoresCompleted >= (int)UpdateManager::ProcessorCount; });

It goes out of waiting and requires the lock on the mutex. Proceed to do its stuff then reaches the end and notifies the other thread that they can continue working with this line:

cv.notify_all();

But it lies, they can't continue working because you hold the lock. Release it and they will proceed working:

void UpdateManager::Update() {
    <...>
    //End Update and nofity threads
    Updating = false;
    lk.unlock();
    cv.notify_all();
}

That probably isn't the only issue in this code but I assume that you lock the mutex before entering the Update method or have some guarantee that it runs before the other one (Work).