I'm trying to optimizie an I/O bound C++ Win32 application. What it actually does is something very similar to recurse a folder and compute a cryptographic hash of every file it finds. It's a single threading application using memory mapped files, so as it's easy to imagine it doesn't seem to use much CPU since most of the times the main thread is put to sleep waiting for I/O to complete. I'm thinking about a couple of solutions, but I'm not sure about it so I'd like to have your opinions.
- I could spawn many threads ( having a fixed size pool of workers to keep memory usage under certain threshold ), but honestly I don't know if this can make the situation better, each thread is gonna be put to sleep just like my main thread in the current implementation, plus the scheduler would "waste" lot of computation power to switch contexts.
- I was thinking about I/O completion ports ( single thread? multi? ), but that would mean to abandon the memory mapped files ( am I wrong ? ) and use standard file operations. If this is the case, could you please provide me some example code on how to use IOCP to read and elaborate a given list of files without putting the reading thread to sleep ?
Any other idea/suggestion/etc would be really appreciated :)
Thanks.
Before parallelizing anything, always ask yourself first: Does the added complexity justify the performance gained? In order to answer that question with minimal effort, just test how much % of max read throughput you already have. That is, test your current read throughput and then test max throughput. Don't use the theoretical max for this computation. Then, think about how much complexity and how many possible issues are introduced in even the simplest approach to gain the last few %.
As already mentioned in the comments, the greatest performance gain here is probably achieved by pipelining (i.e. overlapping computation and I/O). And the easiest way to implement that is with asynchronous reads. This thread lists multiple ways to implement asnychronous file I/O in C++.
If you don't need portability, just use the Windows OVERLAPPED API. Boost ASIO does not seem to make File I/O very easy (yet). I could not find any good examples.
Note that, depending on your system configuration, you have to launch multiple threads to fully saturate I/O bandwidth (especially if files of that folder actually reside on multiple disks, which is possible). Even if you only read from one device, you might fair (slightly) better with multiple threads to mitigate OS overhead.