Is there any way to use IOCP to notify when a socket is readable / writeable?

1.3k views Asked by At

I'm looking for some way to get a signal on an I/O completion port when a socket becomes readable/writeable (i.e. the next send/recv will complete immediately). Basically I want an overlapped version of WSASelect.

(Yes, I know that for many applications, this is unnecessary, and you can just keep issuing overlapped send calls. But in other applications you want to delay generating the message to send until the last moment possible, as discussed e.g. here. In these cases it's useful to do (a) wait for socket to be writeable, (b) generate the next message, (c) send the next message.)

So far the best solution I've been able to come up with is to spawn a thread just to call select and then PostQueuedCompletionStatus, which is awful and not particularly scalable... is there any better way?

2

There are 2 answers

0
Nathaniel J. Smith On BEST ANSWER

It turns out that this is possible!

Basically the trick is:

  • Use the WSAIoctl SIO_BASE_HANDLE to peek through any "layered service providers"
  • Use DeviceIoControl to submit an AFD_POLL request for the base handle, to the AFD driver (this is what select does internally)

There are many, many complications that are probably worth understanding, but at the end of the day the above should just work in practice. This is supposed to be a private API, but libuv uses it, and MS's compatibility policies mean that they will never break libuv, so you're fine. For details, read the thread starting from this message: https://github.com/python-trio/trio/issues/52#issuecomment-424591743

6
Nathaniel J. Smith On

For detecting that a socket is readable, it turns out that there is an undocumented but well-known piece of folklore: you can issue a "zero byte read", i.e., an overlapped WSARecv with a zero-byte receive buffer, and that will not complete until there is some data to be read. This has been recommended for servers that are trying to do simultaneous reads from a large number of mostly-idle sockets, in order to avoid problems with memory usage (apparently IOCP receive buffers get pinned into RAM). An example of this technique can be seen in the libuv source code. They also have an additional refinement, which is that to use this with UDP sockets, they issue a zero-byte receive with MSG_PEEK set. (This is important because without that flag, the zero-byte receive would consume a packet, truncating it to zero bytes.) MSDN claims that you can't combine MSG_PEEK with overlapped I/O, but apparently it works for them...

Of course, that's only half of an answer, because there's still the question of detecting writability.

It's possible that a similar "zero-byte send" trick would work? (Used directly for TCP, and adding the MSG_PARTIAL flag on UDP sockets, to avoid actually sending a zero-byte packet.) Experimentally I've checked that attempting to do a zero-byte send on a non-writable non-blocking TCP socket returns WSAEWOULDBLOCK, so that's a promising sign, but I haven't tried with overlapped I/O. I'll get around to it eventually and update this answer; or alternatively if someone wants to try it first and post their own consolidated answer then I'll probably accept it :-)