Epoll TCP edge-triggered necessity of last read(2) call

2.9k views Asked by At

Given a nonblocking TCP socket, if the call

read(sock, buf, bufLen)

returns a value < bufLen, is it safe to then wait for an edge-triggered EPOLLIN event? Or must I call read again to ensure it's zero or EAGAIN?

In my testing, everything stays working when I remove the last call, I just want to know if it's guaranteed anywhere, or by the Linux source code, and if I can get rid of the extra call.

2

There are 2 answers

0
George Y. On BEST ANSWER

Your question is answered in man 7 epoll. As you see, it depends on the socket type (packet/stream):

Q9 Do I need to continuously read/write a file descriptor until EAGAIN when using the EPOLLET flag (edge-triggered behavior) ?

A9 Receiving an event from epoll_wait(2) should suggest to you that such file descriptor is ready for the requested I/O operation. You must consider it ready until the next (nonblocking) read/write yields EAGAIN. When and how you will use the file descriptor is entirely up to you.

For packet/token-oriented files (e.g., datagram socket, terminal in canonical mode), the only way to detect the end of the read/write I/O space is to continue to read/write until EAGAIN.

For stream-oriented files (e.g., pipe, FIFO, stream socket), the condition that the read/write I/O space is exhausted can also be detected by checking the amount of data read from / written to the target file descriptor. For example, if you call read(2) by asking to read a certain amount of data and read(2) returns a lower number of bytes, you can be sure of having exhausted the read I/O space for the file descriptor. The same is true when writing using write(2). (Avoid this latter technique if you cannot guarantee that the monitored file descriptor always refers to a stream-oriented file.)

3
Damon On

It is "safe" insofar as it won't crash, but unless you continue calling read until you get EAGAIN (or zero, which means the other end has closed the connection), you will sometimes make wrong assumptions about availability of data. What's worst is that it will most of the time look like it works fine, too.

Edge-triggered as opposed to level-triggered notification only guarantees that you get one notification if the readiness state changed since the last time you called epoll_wait, even if there remains data that you could read.
Edge-triggered event notification does behave kind of weird or unintuitively under Linux sometimes, so it may do something different from what you expect and e.g. give you another notification when more data arrives (so your code appears to "work anyway") but that is not what's being guaranteed.
I've had similar "surprises" when using epoll with eventfd. What you'd expect to happen in edge-triggered mode would be all threads that are already blocked waking up (all at the same time, and exactly once), and everyone calling epoll_wait after the event is signalled blocking until the event is consumed and signalled again. What it really does is wake the first thread that called epoll_wait. And surprise again, level-triggered mode works exactly as you'd wish, except you must consume the event to be able to ready it again, for which there is no proper way of doing (as you must do it exactly once or you'll block in read).

Thus, if you don't consume all data and later wait for being notified again, you may be lucky and it will "work anyway", or you may wait for a quite long time, possibly forever. My recommendation is therefore to definitely keep reading until you get EAGAIN, it's the only truly reliable thing to avoid surprises.

Do note that you can starve slow senders if you keep naively reading. If you have a very fast sender and you keep reading on the fast sender then you'll never see EAGAIN (at least not for as long as the other end keeps sending!), and you will completely starve other senders.
It therefore makes sense to put all ready descriptors in a list and read them round-robin, removing them from the list when they return EAGAIN.