Clarification on how pipe() and dup2() work in C

3.6k views Asked by At

I am writing a simple shell that handles piping. I have working code, but I don't quite understand how it all works under the hood. Here is a modified code snippet I need help understanding (I removed error checking to shorten it):

int fd[2];
pipe(fd);

if (fork()) { /* parent code */
    close(fd[1]);
    dup2(fd[0], 0);

    /* call to execve() here */

} else { /* child code */
    close(fd[0]);
    dup2(fd[1], 1);
}

I have guesses for my questions, but that's all they are - guesses. Here are the questions I have:

  1. Where is the blocking performed? In all the example code I've seen, read() and write() provide the blocking, but I didn't need to use them here. I just copy STDIN to point at the at the read end of the pipe and STDOUT to point to the write end of the pipe. What I'm guessing is happening is that STDIN is doing the blocking after dup2(fd[0], 0) is executed. Is this correct?
  2. From what I understand, there is a descriptor table for each running process that points to the open files in the file table. What happens when a process redirects STDIN, STDOUT, or STDERR? Are these file descriptors shared across all processes' descriptor tables? Or are there copies for each process? Does redirecting one cause changes to be reflected among all of them?
  3. After a call to pipe() and then a subsequent call to fork() there are 4 "ends" of the pipe open: A read and a write end accessed by the parent and a read and a write end accessed by the child. In my code, I close the parent's write end and the child's read end. However, I don't close the remaining two ends after I'm done with the pipe. The code works fine, so I assume that some sort of implicit closing is done, but that's all guess work. Should I be adding explicit calls to close the remaining two ends, like this?

    int fd[2];
    pipe(fd);
    
    if (fork()) { /* parent code */
        close(fd[1]);
        dup2(fd[0], 0);
    
        /* call to execve() here */
    
        close(fd[0]);
    
    } else { /* child code */
        close(fd[0]);
        dup2(fd[1], 1);
        close(fd[1]);
    }
    
  4. This is more of a conceptual question about how the piping process works. There is the read end of the pipe, referred to by the file handle fd[0], and the write end of the pipe, referred to by the file handle fd[1]. The pipe itself is just an abstraction represented by a byte stream. The file handles represent open files, correct? So does that mean that somewhere in the system, there is a file (pointed at by fd[1]) that has all the information we want to send down the pipe written to it? And that after pushing that information through the byte stream, there is a file (pointed at by fd[0]) that has all that information written to it as well, thus creating the abstraction of a pipe?

2

There are 2 answers

3
Nicholas Wilson On BEST ANSWER
  1. Nothing in the code you've provided blocks. fork, dup2, and close all operate immediately. The code does not pause execution anywhere in the lines you've printed. If you're observing any waiting or hanging, it's elsewhere in your code (eg. in a call to waitpid or select or read).

  2. Each process has its own file descriptor table. The files objects are global between all processes (and a file in the file system may be open multiple times, with different file objects representing it), but the file descriptors are per-process, a way for each process to reference the file objects. So a file descriptor like "1" or "2" only has meaning in your process -- "file number 1" and "file number 2" probably mean something different to another process. But it's possible for processes to reference the same file object (although each might have a different number for it).

    So, technically, that's why there are two sets of flags you can set on file descriptors, the file descriptor flags that aren't shared between processes (F_CLOEXEC), and the file object flags (such as O_NONBLOCK) that get shared even between processes.

    Unless you do something weird like freopen on stdin/stdout/stderr (rare) they're just synonyms for fds 0,1,2. When you want to write raw bytes, call write with the file descriptor number; if you want to write pretty strings, call fprintf with stdin/stdout/stderr -- they go to the same place.

  3. No implicit closing is done, you're just getting away with it. Yes, you should close file descriptors when you're done with them -- technically, I'd write if (fd[0] != 0) close(fd[0]); just to make sure!

  4. Nope, there's nothing written to disk. It's a memory backed file, which means that the buffer doesn't get stored anywhere. When you write to a "regular" file on the disk, the written data is stored by the kernel in a buffer, and then passed on to the disk as soon as possible to commit. When you write to a pipe, it goes to a kernel-managed buffer just the same, but it won't normally go to disk. It just sits there until it's read by the reading end of the pipe, at which point the kernel discards it rather than saving it.

    The pipe has a read and write end, so written data always goes at the end of the buffer, and data that's read out gets taken from the head of the buffer then removed. So, there's a strict ordering to the flow, just like in a physical pipe: the water drops that go in one end first come out first from the other end. If the tap at the far end is closed (process not reading) then you can't push (write) more data into your end of the pipe. If the data isn't being written and the pipe empties, you have to wait when reading until more data comes through.

3
Nicola Musatti On

First of all you usually call execve or one of its sister calls in the child process, not in the parent. Remember that a parent knows who its child is, but not vice-versa.

Underneath a pipe is really a buffer handled by the operating system in such a way that it is guaranteed that an attempt to write to it blocks if the buffer is full and that a read to it blocks if there is nothing to read. This is where the blocking you experience comes from.

In the good old days, when buffers were small and computers were slow, you could actually rely on the reading process being awoken intermittently, even for smallish amounts of data, say in the order of tens of kilobytes. Now in many cases the reading process gets its input in a single shot.