I'm trying to use IOCP relying on Windows API CreateThreadpoolIo and StartThreadpoolIo, but I found the thread pool is just to make the code behind the IO completed parallel. The async IO submit operations are also execute sequentially in the main thread. So why we need this? I think make the IO submit operations parallel can improve the throughput even if they are async operations, right?
The other cost is if we make them parallel, we might need to lock something to guarantee data consistency (thread safe operation).
It is possible to do IOCP without using CreateThreadpool / StartThreadpoolIo, in that case you have to manage calling GetQueuedCompletionStatus yourself (whether in a self-managed thread pool or otherwise - it is even conceivable that it could be interleaved into the actions of the thread that started the I/O, but in that case why bother with IOCP?). StartThreadpoolIO is needed in order to have a thread waiting on GetQueuedCompletionStatus instead of WaitForMultipleObjects (or one of its variants). CancelThreadpoolIo decrements a counter saying how many IOCP operations are outstanding and if that counter reaches 0 the thread pool knows it can stop waiting on GetQueuedCompletionStatus.