AIO support on Linux

3.5k views Asked by At

Does anyone know where I can get up to date information about the state on Kernel support for aio on the latest Linux Kernel?. Google searches bring up web pages that may be hopelessly out of date.

Edit:

More specifically, I am interested in non-file related descriptors like pipes and sockets. Stuff on the web indicate that there is no support, is this still the case?

Edit2: What I am looking for is something similar to Windows OVERLAPPED IO

3

There are 3 answers

6
Edwin Buck On BEST ANSWER

AIO support has been included in the linux kernel proper. That's why the first hit on Google only offers patches to the 2.4 Linux kernel. In 2.6 and 3.0 it's already in there.

If you checkout the Linux kernel source code, it's at fs/aio.c

There's some documentation in the GNU libc manual, but be advised that aio is not possible for all types of Linux file descriptors. Most of the general "how to" documentation is dated around 2006, which is appropriate since that's when AIO in Linux was making the headlines.

Note that the POSIX.1b and Unix98 standards haven't changed, so can you be a bit specific as to the nature of the "out-of-date"ness of the examples?

7
Ambroz Bizjak On

You don't need POSIX AIO (i.e. man aio) to use sockets and pipes asynchronously. According to man 3 aio it is not even possible. You should use non-blocking file descriptors instead, together with an event notification interface, such as select(), poll(), or epoll. epoll is Linux specific, but scales much better than the former two.

To use file descriptors in non-blocking mode you have to set the O_NONBLOCK flag on every file descriptor:

fcntl(fd, F_SETFL, O_NONBLOCK)

After a file descriptor is in non-blocking mode, I/O operations like read() and write() will never block, but will return EAGAIN or EWOULDBLOCK if the operation cannot be completed immediately. Some more specific operations, like connect(), have to be used in a different way in non-blocking mode; see relevant man pages.

To be able to use non-blocking file descritors correctly, your application needs to be event driven. Basically, in main(), you need to first initialize stuff, then enter the event loop. The event loop repetedly waits for events (using an event notification interface, e.g. epoll_wait()), then checks which events happened, and responds to them.

Now when you do say a read(), and it fails with EWOULDBLOCK, you add it to the list of file descriptors watched for readability; when the event provider indicates readability, you try again.

Similarly, if you try to write() and it fails with EWOULDBLOCK, you might want to buffer the data and try again when writability is indicated.

2
Damon On

There are two kinds of AIO under Linux.

One is kernel-AIO. It is ugly and sometimes does not behave in accordance with the documentation (for example, it will run synchronously under certain conditions without you being able to do something about it, and it will not properly cancel in-flight requests under certain conditions, etc, etc). It does not work on pipes.
These are the io_ kind of functions. Note that you must link with -laio, which you must separately install on some systems (e.g. Debian/Ubuntu).

The second is is a pure userland implementation (glibc) which spawns threads on demand to handle requests. It is well-documented, works reasonably well, and according to the documentation, and it works with pretty much anything that is a file descriptor including pipes.
These are the aio_kind of functions. I would definitively recommend to use these, even if they are an "uncool userland implementation" -- they work nicely.

Both work with eventfd as a notification mechanism in the mean time, btw, though the kernel version was still undocumented last time I looked (but the funciton is in the headers).

Or, as Ambroz Bizjak pointed out, skip AIO at all, for what you describe it's not strictly necessary.

EDIT:
On a different note, since you used the words "pipes" and "sockets", are you aware of vmsplice and splice? Those are the probably most efficient functions to send data to/from sockets/pipes. Unluckily, it's another one of those ambiguously documented, hard to understand hacks with obscure pitfalls. Proceed at your own risk, you have been warned.

splice lets you transfer data from a socket (or any file descriptor) to a pipe, or the other way around. vmsplice lets you transfer data between application space and a pipe.
Ironically, vmsplice is ideally supposed to do the exact same thing (remap pages, a.k.a. "play with VM") that one particular person took as argument to claim that all BSD developers are idiots, back in 2006.

So much for the good news, the bad news is that there is a "secret limit" to how much data you can move. As far as I remember it's 64kB (but configurable somewhere in /proc). If you have more data than that, you must therefore work in several chunks, presumably with several pipe buffers, filling one while the other is read, and reusing old pipe buffers after they are done.
And this is where it gets complicated. If you browse through the discussions Kernel Trap, you find that even the Grand Master is not 100% sure about when it's safe to overwrite an old buffer when juggling with several buffers.

Also, for vmsplice to really work (i.e. remapping pages instead of copying), you need to use the "GIFT" flag, and at least to me it's not clear from the docs what becomes of that memory then. Following the docs to the letter, you would need to leak memory, since you are never allowed to touch it again. Of course that can't be it. Maybe I'm just stupid.

I eventually gave up on this, and just settled for using epoll for readiness and non-blocking sockets with plain normal write. That combination is maybe not the utmost performer, but it is well-documented and works as documented.