As far as I konw. Linux epoll is asynchronous notification. when a file descriptor become readable/writeable/acceptable, epoll_wait
will return this fd. But read or write is still synchronous, will block thread.
So Redis 6.0 use a thread pool to handle network io.
Windows IOCP and Linux io_uring are Proactor. when io_uring_enter
return, the read data already place in buffer, the write buffer all has been written.
My Question is:
- Who is responsible for copying these buffer data?
- Does read/wirte still block current thread?
- If so, how to speed up using thread pool?
Not sure if it's still helpful after half a year, but probably worth being answered for other users who are wondering about the same question.
1. Who is responsible for copying these buffer data?
Both IOCP and io_uring work on the OS kernel side. In the case of
io_uring
, the kernel spawns worker threads that execute the tasks and signal about completion via the completion queue (CQ), meaning that you not only avoid callingread()
andwrite()
yourself, but also these operations are done exactly in the kernel, which saves your currently running thread from unnecessary syscalls (the context switches between user/kernel modes are quite expensive).You can check the following article to understand it a bit better: https://blog.cloudflare.com/missing-manuals-io_uring-worker-pool/
In addition, you can think of
io_uring
as an effective mechanism of batch execution for syscalls. It allows calling many OS functions only with a price of the single syscall -io_uring_enter
.The IOCP mechanisms are quite similar, although I wasn't able to find how exactly it utilizes the kernel threads to execute the tasks, but it is safe to assume that it uses at least one kernel thread to handle its driver IRPs (I/O request packets).
Answering your question, it's the kernel and its kernel-mode threads responsible for copying the buffer data.
2. Does read/write still block the current thread?
If you use the Overlapped I/O or
io_uring
with non-blocking files/sockets, the calls submitted to the kernel don't block the current thread. You only need to block your thread when you're waiting (or polling) for the completion queue.A little addition about
epoll
and blocking reads or writes:The reads or writes on the ready file descriptors are not really "blocking" your thread, e.g. if there is any available data on a socket, the
read()
operation will just copy it from the kernel buffer to your own buffer and that's it. There is no real blocking except paying a price of a syscall. However, it's still possible to parallelize those operations using the thread pool. You can even parallelize reads for a single socket, but that requires an understanding ofEPOLLONESHOT
andEPOLLEXCLUSIVE
flags to avoid race conditions and the "thundering herd" problem.This is very well explained in this article: https://idea.popcount.org/2017-02-20-epoll-is-fundamentally-broken-12/