read() from files - blocking vs. non-blocking behavior

5.7k views Asked by At

Let's assume we opened a file using fopen() and from the file-pointer received, fetch the file-descriptor using fileno(). Then we do lots (>10^8) of random read()s of relativly small chunks, between a size of 4Bytes to 10KBytes from this file:

Is it expected behaviour such a read() might return less bytes then requested, without setting errno, if the file-system is an

  1. ext3

  2. NFS

  3. OCFS2

  4. combination of 2 and 3 (OCFS2 via NFS)

?

My readings gave me the conclusion it should not be possible for 1. (if the file has not O_NONBLOCK set, if ever possible for ext3 to have it set) but for the other three (2., 3., 4.) I'm uncertain.

(Btw: Could I assume having O_NONBLOCK not set to be the default in any case?)

This questions arose because I observed read()s returning less bytes then requested without errno set in case 4.

The problem to drill this down by testing is that such behaviour happens in <1/1000000000 cases ... - which is still too often :-}

Update: The average file size is between some TBytes and around 1GByte.

2

There are 2 answers

1
Kyle Jones On

You should not assume that read() will not return less bytes than requested for any filesystem. This is particularly true in the case of large reads, as POSIX.1 indicates that read() behavior for sizes larger than SSIZE_MAX is implementation-dependent. On this mainstream Unix box I'm using right now, SSIZE_MAX is 32767 bytes. The fact that read() always returns the full amount today does not mean that it will in the future.

One possible reason might be that I/O priorities are more fully fleshed out in the kernel in the future. E.g. you're trying to read from the same device as another higher priority process and the other process would get better throughput if your process wasn't causing head movement away from the sectors the other process wants. The kernel might choose to give your read() a short count to get you out of the way for a while, instead of continuing to do inefficient interleaved block reads. Stranger things have been done for the sake of I/O efficiency. What is not prohibited often becomes compulsory.

0
alk On

We solved the problem described as having read() return less bytes then request when reading from a file located on a NFS mount, pointing to an OCFS2 file system (case 4 in my question).

It is a fact that using the setup mentioned above, such read()s on file descriptors sometimes return less bytes then requested, without having errno set.

To have all data read it is as simple as just read()ing again and again up until the amount of data requested had been read.

Moreover such setup sometimes makes read() fail with EIO, and even then a simple re-read() leads to success and data arrives.

My conclusion: Reading via OCFS2 via NFS makes read()ing from files behave like read()ing from sockets which is inconsistent with the specifications of read() http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html :

When attempting to read a file (other than a pipe or FIFO) that supports non-blocking reads and has no data currently available:

If O_NONBLOCK is set, read() shall return -1 and set errno to [EAGAIN].

If O_NONBLOCK is clear, read() shall block the calling thread until some data becomes available.

No need to say we never ever tried, nor even thought about to set O_NONBLOCK for the file descriptors in question.