Asynchronous read in C

276 views Asked by At

How do I read a portion of a file using pread(2) in C? I've tried using the O_NONBLOCK flag on open() but the read still seems to block the thread (i.e reading a large portion at once, 1GB, seems to hang the thread for about 300ms on my M1 mac)

I've looked into aio_read() but that does not seem to work at all on some systems and seems to either spawn a new thread to run the callback, or run all callbacks on one thread, which is not suitable for my needs.

Ideally I'd like a solution in which the process does not have to do any waiting and can simply poll to check if the buffer has been populated, something analogous to how nodejs would implement fs.read(fd, ..., callback)

What is the proper way to achieve what I'm looking for? I've searched online for hours but most questions seem to concern sockets / pipes and not actual files (where read() literally waits for new data to arrive)

Context: server software with high CPU loads and very high disk usage (multiple million reads per minute of about 1KB-4KB each), the vast majority of reads will be made to an NVMe drive (with average 0.01ms response time, but expected to be much higher under load)

The current accepted solution seems to be using thread pools, any information on the overhead of frequent thread switching would also be useful

4

There are 4 answers

0
BlobKat On BEST ANSWER

As pointed out on one of the comments of this question the most accepted way to do this is synchronously with a thread pool, which is what I will be using.

9
Luis Colorado On

You say

How do I read a portion of a file using pread(2) in C? I've tried using the O_NONBLOCK flag on open() but the read still seems to block the thread (i.e reading a large portion at once, 1GB, seems to hang the thread for about 300ms on my M1 mac)

Of course O_NONBLOCK is not an issue here. The kernel doesn't block your process on open because it wants, but because it is necessary. Using non blocking open will not make the read to go ahead, just because you said O_NONBLOCK. (what it means is that every system call you make that requires the descriptor to be open will fail with the error EAGAIN and you will not get your data written anywhere)

By the way, it's quite strange that your kernel will have a read ahead of 1Gb from your file at once. Your call will block always (the kernel will not give as much as it has, if you have asked for 1Gb, your process ---and the file inode will be locked--- will wait for the data to come from the disk until it has passed to your process the full Gb of data) While this is not true for a socket, or a pipe, it is for a file. The kernel implements file reading as an atomic operation, to deal with other threads/processes accessing the same file, and the file remains locked (and no other system call can proceed on that file, while the inode is locked) If you had used read or write instead of pread or pwrite the result whould have been the same (the only implementation difference is that the file pointer is not used at all in the operation, as you provide it in the call)

Reading a 1Gb file from disk (because believe me, the file is stored in the disk) in only 300ms is a very good performance (it is an average of 3.5Gb/s, for probably a single/nonthreaded process) this can be a result of having part of the data you want to read already buffered (so no disk access is in use) Even if you have a disk capable of transferring 40.0Gb/s or more, think that the disk driver must spend some time in changing heads (if that is the case) or servicing other processes requests. A second factor that can be slowing your data access is that, if you put your process into computing (or i/o) only task, the kernel soon will lower its priority to benefit shorter i/o processes, and this operation, while efficient for the more dynamic processes, just will affect negatively to yours. Another issue is that using huge variables and a large amount of memory, makes your process more suitable to be swapped out, and this will require to reswapp it in to satisfy the partial reads as they are written to the variable you designed for storing the buffer. This will slow also your program.

Ideally I'd like a solution in which the process does not have to do any waiting and can simply poll every millisecond to check if the buffer has been populated, something analogous to how nodejs would implement fs.read(fd, ..., callback)

That's not the way to get things done efficiently. Polling is a way of waiting for something to happen, but not efficiently, so your paragraph is a bit contradictory in itself. Today's computers almost do no polling to resources, because the events that come (like a disk finishing the transfer) in play normally interrupt the CPU that distracts for a few us. to attend the disk device and store the data in place. Only when things are finished, your process is awaken and left to go, instead of having it waiting in a loop and consuming expensive CPU time. The best way to ask the kernel for something is to ask for it, and remain waiting (blocked, not consuming CPU cycles doing something absurd like polling if it is ready) the kernel to give you the whole thing when it is ready.

When you go to a burguer to get a meal, normally you are not asking the waiter each minute if your hamburger is ready. You just sit, and wait for the waiter to come to your table with the hamburger ready to eat. The reason is that you want to eat it warm, there's not a buffer of burgers waiting to be served in a pool for you to ask it. The O_NONLOCK simply doesn't work here. If you ask the system to return EAGAIN in a loop, until it answers differently, the hamburger will not be made faster. The waiter will be pissed off (well, you have a very patient kernel) and you will finally be banned from that place.

If the case is that you want to be free to do something else, this is never the solution. You have at your disposal other resources like the select/poll system calls, or using O_NONBLOCK to continue, until you don't have more to do, and then block in a no O_NONBLOCK call (that will wait for the meal to be ready) It takes time to service you, and you must understand it, even if things happen so quick that you cannot wink an eye in that time.

BTW, reading how do you pread the data (I mean, reading your actual code) will probably show more facts on the problem. You should read How to create a Minimal, Reproducible Example and incorporate a complete program that can be run an tested for any error you can be having in your code.

Sometimes, the best way to read a full bunch of data is just the following code:

#define ONE_GIGABYTE (1024*1024*1024)
...
    static buffer[ONE_GB];
    size_t buffer_size = 0;
    int c;
    while ((c == fgetc(input)) != EOF && buffer_size < ONE_GB) {
        /* append your character to the buffer */
        buffer[buffer_size++] = c;
    }
    /* now you have a full Gb of file in your memory */

But, again, beware, as having 1Gb variables is probably a candidate for your process to be swapped out by the kernel, making it neccessary to reswap it in, when the data is being traspassed to user mode, making several swapin/out in your system, making things to run slower.

Note: I don't know the reasons why you are using pread instead of just read, but if you think that specifying the file pointer in the call will make things go faster, you are wrong: The reason of having a system call to specify the pointer is to avoid the kernel updating the file pointer and not disturbing other processes that are doing sequential reading sharing the pointer file.

1
Misha T On

In modern Linux (version > 5.1) you could try to use io_uring non blocking interface. Example of usage for disk file read operations could be found here https://github.com/shuveb/io_uring-by-example/blob/master/03_cat_liburing/main.c

4
Jason On

This looks like a perfect job for mmap. You can use mmap to get a char* to the contents of the file directly.

#include <sys/stat.h>
#include <sys/mman.h>
...
{
    int fd = open(file_name, O_RDONLY);
    if (fd == -1) { /* handle error */ }

    struct stat sb;
    if (fstat(fd, &sb) == -1) { /* handle error */ }

    char* map = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
    if (map == MAP_FAILED) { /* handle error */ }
    
    /* Since you mentioned you are doing a lot of random access,
     * use this hint to tell the kernel how you plan to use it.
     */
    madvise(map, sb.st_size, MADV_RANDOM);

    /* Once, you have the map, you actually don't need the file
     * descriptor open anymore...
     */
    close(fd);
}

Make sure to store the sb.st_size somewhere, as that is now your reference to the end of the file.

The driver that handles the actual disk reads is going to be page fault interrupts. Most of the data will remain in memory until the kernel needs some and swaps your data back out to disk. The other awesome feature of mmap is how simple the code becomes once you have one. You want byte 10000?

char byte_10000 = map[10000];

Want to search for a specific character or string? Just throw the whole map into memchr or memmem. No reads, no system calls. You sort of just make the whole thing the kernel's problem =]