How to parallelize "while" loop by the using of PPL

708 views Asked by At

I need to parallelize "while" loop by the means of PPL. I have the following code in Visual C++ in MS VS 2013.

int WordCount::CountWordsInTextFiles(basic_string<char> p_FolderPath, vector<basic_string<char>>& p_TextFilesNames)
{
    // Word counter in all files.
    atomic<unsigned> wordsInFilesTotally = 0;
    // Critical section.
    critical_section cs;

    // Set specified folder as current folder.
    ::SetCurrentDirectory(p_FolderPath.c_str());

    // Concurrent iteration through p_TextFilesNames vector.
    parallel_for(size_t(0), p_TextFilesNames.size(), [&](size_t i)
    {
        // Create a stream to read from file.
        ifstream fileStream(p_TextFilesNames[i]);
        // Check if the file is opened
        if (fileStream.is_open())
        {
            // Word counter in a particular file.
            unsigned wordsInFile = 0;

            // Read from file.
            while (fileStream.good())
            {
                string word;
                fileStream >> word;
                // Count total number of words in all files.
                wordsInFilesTotally++;
                // Count total number of words in a particular file.
                wordsInFile++;
            }

            // Verify the values.
            cs.lock();
            cout << endl << "In file " << p_TextFilesNames[i] << " there are " << wordsInFile << " words" << endl;
            cs.unlock();
        }
    });
    // Destroy critical section.
    cs.~critical_section();

    // Return total number of words in all files in the folder.
    return wordsInFilesTotally;
}

This code does parallel iteration through std::vector in outer loop. Parallelism is provided by concurrency::parallel_for() algorithm. But this code also has nested "while" loop that executes reading from file. I need to parallelize this nested "while" loop. How can this nested "while" loop can be parallelized by the means of PPL. Please help.

1

There are 1 answers

0
bobbymcr On

As user High Performance Mark hints in his comment, parallel reads from the same ifstream instance will cause undefined and incorrect behavior. (For some more discussion, see question "Is std::ifstream thread-safe & lock-free?".) You're basically at the parallelization limit here with this particular algorithm.

As a side note, even reading multiple different file streams in parallel will not really speed things up if they are all being read from the same physical volume. The disk hardware can only actually support so many parallel requests (typically not more than one at a time, queuing up any requests that come in while it is busy). For some more background, you might want to check out Mark Friedman's Top Six FAQs on Windows 2000 Disk Performance; the performance counters are Windows-specific, but most of the information is of general use.