epoll
in edge trigger mode is a strange beast. It requires the process to keep track of what the last response for each monitored FD is. It mandates the process to handle, without fail, each and every event reported (or else we might think that an FD is not reporting anything whilst it is, in fact, muted by the edge trigger behavior).
What are the use cases where edge trigger epoll
makes sense?
The main use case for
EPOLLET
that I'm aware of is with micro-threads.To recap - user space is doing context switches between micro-threads (which I'm going to call "fibers" because it's shorter) based on the availability of something to work on. This is also called "collaborative multi-tasking".
The basic handling of file descriptors is by wrapping the relevant IO functions like so:
start_monitoring
is a function that makes sure thatfd
is monitored for read availability.wait_event
performs a context switch out until the scheduler re-awakens this fiber becausefd
now has data ready for reading.The usual way to implement this with
epoll
is to callEPOLL_CTL_MOD
onfd
withinstart_monitoring
to add listening forEPOLLIN
, and again after the epoll has reported the event to stop listening forEPOLLIN
.This means that a
read
that has data available will finish within 1 system call, but a read that returnsEAGAIN
will take at least 4 system calls (originalread
, twoEPOLL_CTL_MOD
, and the finalread
that succeeds).Notice that the above does not count the
epoll_wait
that also has to take place. I do not count it because I'm taking the generous assumption that other fibers are also about to be woken with that same system call, so it is unfair to attribute its cost entirely to our fiber. All in all, this mechanism needs 4+x system calls, where x is between 0 and one.One way to reduce the cost is to use
EPOLLONESHOT
. Doing so removesfd
from monitoring automatically, reducing our cost to 3+x. Better, but we can do better yet.Enter
EPOLLET
. The previousfd
state can be either armed or unarmed (i.e. - whether the next event will trigger theepoll
). Also, the fd may or may not currently (at the point of entry toread
) have data ready. Four states. Let's spread them out.Ready (whether armed or not): The first call to
read
returns the data. 1 system call. This path does not change the armed state, and ready state depends on whether we read everything.Not ready (whether armed or not): The first call to
read
returnsEAGAIN
, thus arming the fd. We go to sleep inwait_event
without having to execute another system call. Once we wake up, we are in unarmed mode (as we just woke up). We thus do not need to callepoll_ctl
to disable listening on the fd. We callread
which returns the data. We leave the function either ready or not, but unarmed.Total cost: 2+x.
We will have to face one spurious wakeup per
fd
, as thefd
starts out armed. Our code will have to handle the case whereepoll
reports an fd for which no fiber is listening. Handling, in this case, just means ignore and move on. The FD will not be spuriously reported again.