Parallel.For is only utilising the CPU 10% all the time

116 views Asked by At

So, I was trying speed up the my workflow by applying parallelism to my algorithm. I have a list of data, and I am given a window size that is less than the count of that data list.

My task is to apply a window on my data, and then I need to do some comparison to determine whether that window has bad data or not by returning a boolean. Then, I slide that window down by just 1, repeat, and continue the process until the list is finish.

The problem is it starts to get very slow when my window size is big. It will be something like O(N*W^2), where N is the data size, and W is the window size. I need to copy W size of data into a sublist(GetRange), and then loop through that sublist to compare.

So I try to apply parallelism by putting that code in a Parallel.For. However, when I look at the CPU core usage it just shows ~10% utilisation, and I can see some of the cores are just being idle. How do I maximize all my CPU power in this case? I have got like 16 cores on my machine.

public static double CountBadData(List<double> data, int someWindowSize, double limit)
{
    var badCount = 0;
    var totalSlidings = data.Count - someWindowSize + 1;
    Parallel.For(0, totalSlidings, i =>
    {
        if (!DetermineGoodOrBad(data.GetRange(i,someWindowSize), limit))
            Interlocked.Increment(ref badCount);
    });
    
    return badCount;
}

private bool DetermineGoodOrBad(List<double> subData, double limit)
{
    foreach(var data in subData)
    {
        if (data > limit) return false;
    }
    
    return true;
}
2

There are 2 answers

1
JonasH On BEST ANSWER

I'm not going to try to answer why the parallelism works poorly. There might be bandwith or caching issues, or that the CPUs spend time transferring ownership of the badCount variable. You will likely need to profile your program to get a real answer.

I'm instead going to try answering the implied question, how to make the code faster.

One way to do this is to keep track of how many bad values exist within the sliding window, so when we move the window one step we only need to check the values that where added and removed from the window, and decriment/increment the running count accordingly:

public static double CountBadData(List<double> data, int someWindowSize, double limit)
{
    var badValuesInWindow = 0;
    var totalSlidings = data.Count - someWindowSize + 1;
    var runningBadCount = 0;
    for (int i = 0; i < someWindowSize; i++)
    {
        if (data[i] > limit)
        {
            runningBadCount++;
        }
    }
    for (var i = 0; i < totalSlidings; i++)
    {
        if (data[i] > limit)
        {
            runningBadCount--;
        }
        if (data[i + someWindowSize] > limit)
        {
            runningBadCount++;
        }
        if (runningBadCount > 0)
        {
            badValuesInWindow++;
        }
    }
    return badValuesInWindow;
}

Be aware that this code is not tested. I would assume that there are a few of-by-one errors.

0
Theodor Zoulias On

I agree with JonasH that optimizing the algorithm is a more promising path for solving your problem, than trying to parallelize an unoptimized algorithm. For the sake of completeness, the parallelization attempt could be improved by avoiding to create physical slices of the List<double> data with the List<T>.GetRange method, and instead passing the whole list with indexes denoting the beginning and end of a virtual slice:

private bool IsGoodData(List<double> data, int from, int toExclusive, double limit)
{
    for (int i = from; i < toExclusive; i++)
        if (data[i] > limit) return false;
    return true;
}

Then the CountBadData can be implemented like this:

public double CountBadData(List<double> data, int windowSize, double limit)
{
    var badCount = 0;
    var totalSlidings = data.Count - windowSize + 1;
    ParallelOptions options = new() { MaxDegreeOfParallelism = Environment.ProcessorCount };
    Parallel.For(0, totalSlidings, options, i =>
    {
        if (!IsGoodData(data, i, i + windowSize, limit))
            Interlocked.Increment(ref badCount);
    });
    return badCount;
}

It is assumed that the IsGoodData method is restricted to just reading the data list. Otherwise, modifying the same List<T> from multiple threads concurrently would result in undefined behavior.