Understanding dup2 and closing file descriptors

16.9k views Asked by At

I'm posting my code simply for context of my question. I'm not explicitly looking for you to help fix it, I'm more so looking to understand the dup2 system call that I'm just not picking up from the man page and the numerous other stackoverflow questions.

    pid = fork();

    if(pid == 0) {
        if(strcmp("STDOUT", outfile)) {
            if (command->getOutputFD() == REDIRECT) {
                if ((outfd = open(outfile, O_CREAT | O_WRONLY | O_TRUNC)) == -1)
                    return false;
                command->setOutputFD(outfd);
                if (dup2(command->getOutputFD(), STDOUT_FILENO) == -1)
                    return false;
                pipeIndex++;
            }
            else if (command->getOutputFD() == REDIRECTAPPEND) {
                if ((outfd = open(outfile, O_CREAT | O_WRONLY | O_APPEND)) == -1)
                    return false;
                command->setOutputFD(outfd);
                if (dup2(command->getOutputFD(), STDOUT_FILENO) == -1)
                    return false;
                pipeIndex++;
            }
            else {
                if (dup2(pipefd[++pipeIndex], STDOUT_FILENO) == -1)
                    return false;
                command->setOutputFD(pipefd[pipeIndex]);
            }
        }

        if(strcmp("STDIN", infile)) {
            if(dup2(pipefd[pipeIndex - 1], STDIN_FILENO) == -1)
                return false;
            command->setOutputFD(pipefd[pipeIndex - 1]);
            pipeIndex++;
        }


        if (execvp(arguments[0], arguments) == -1) {
            std::cerr << "Error!" << std::endl;
            _Exit(0);
        }

    }

    else if(pid == -1) {
        return false;
    }

For context to you, that code represents the execution step of a basic linux shell. The command object contains the commands arguments, IO "name", and IO descriptors (I think I might get rid of the file descriptors as fields).

What I'm having the most difficultly understanding is when and which file descriptors to close. I guess I'll just ask some questions to try and improve my understanding of the concept.

1) With my array of file descriptors used for handling pipes, the parent has a copy of all those descriptors. When are the descriptors held by the parent closed? And even more so, which descriptors? Is it all of them? All of the ones left unused by the executing commands?

2) When handling pipes within the children, which descriptors are left open by which processes? Say if I execute the command: ls -l | grep "[username]", Which descriptors should be left open for the ls process? Just the write end of the pipe? And if so when? The same question applies to the grep command.

3) When I handle redirection of IO to a file, a new file must be opened and duped to STDOUT (I do not support input redirection). When does this descriptor get closed? I've seen in examples that it gets closed immediately after the call to dup2, but then how does anything get written to the file if the file has been closed?

Thanks ahead of time. I've been stuck on this problem for days and I'd really like to be done with this project.


EDIT I've updated this with modified code and sample output for anyone interested in offering specific help to my issue. First I have the entire for loop that handles execution. It has been updated with my calls to close on various file descriptors.

while(currCommand != NULL) {

    command = currCommand->getData();

    infile = command->getInFileName();
    outfile = command->getOutFileName();
    arguments = command->getArgList();

    pid = fork();

    if(pid == 0) {
        if(strcmp("STDOUT", outfile)) {
            if (command->getOutputFD() == REDIRECT) {
                if ((outfd = open(outfile, O_CREAT | O_WRONLY | O_TRUNC)) == -1)
                    return false;
                if (dup2(outfd, STDOUT_FILENO) == -1)
                    return false;
                close(STDOUT_FILENO);
            }
            else if (command->getOutputFD() == REDIRECTAPPEND) {
                if ((outfd = open(outfile, O_CREAT | O_WRONLY | O_APPEND)) == -1)
                    return false;
                if (dup2(outfd, STDOUT_FILENO) == -1)
                    return false;
                close(STDOUT_FILENO);
            }
            else {
                if (dup2(pipefd[pipeIndex + 1], STDOUT_FILENO) == -1)
                    return false;
                close(pipefd[pipeIndex]);
            }
        }
        pipeIndex++;

        if(strcmp("STDIN", infile)) {
            if(dup2(pipefd[pipeIndex - 1], STDIN_FILENO) == -1)
                return false;
            close(pipefd[pipeIndex]);
            pipeIndex++;
        }

        if (execvp(arguments[0], arguments) == -1) {
            std::cerr << "Error!" << std::endl;
            _Exit(0);
        }
    }

    else if(pid == -1) {
        return false;
    }

    currCommand = currCommand->getNext();

}

for(int i = 0; i < numPipes * 2; i++)
    close(pipefd[i]);

for(int i = 0; i < commands->size();i++) {
    if(wait(status) == -1)
        return false;
}

When executing this code I receive the following output

ᕕ( ᐛ )ᕗ ls -l
total 68
-rwxrwxrwx 1 cook cook   242 May 31 18:31 CMakeLists.txt
-rwxrwxrwx 1 cook cook   617 Jun  1 22:40 Command.cpp
-rwxrwxrwx 1 cook cook  9430 Jun  8 18:02 ExecuteExternalCommand.cpp
-rwxrwxrwx 1 cook cook   682 May 31 18:35 ExecuteInternalCommand.cpp
drwxrwxrwx 2 cook cook  4096 Jun  8 17:16 headers
drwxrwxrwx 2 cook cook  4096 May 31 18:32 implementation files
-rwxr-xr-x 1 cook cook 25772 Jun  8 18:12 LeShell
-rwxrwxrwx 1 cook cook   243 Jun  5 13:02 Makefile
-rwxrwxrwx 1 cook cook   831 Jun  3 12:10 Shell.cpp
ᕕ( ᐛ )ᕗ ls -l > output.txt
ls: write error: Bad file descriptor
ᕕ( ᐛ )ᕗ ls -l | grep "cook"
ᕕ( ᐛ )ᕗ 

The output of ls -l > output.txt implies that I'm closing the wrong descriptor, but closing the other related descriptor, while rendering no error, provides no output to the file. As demonstrated by ls -l, grep "cook", should generate output to the console.

1

There are 1 answers

2
Filipe Gonçalves On BEST ANSWER

With my array of file descriptors used for handling pipes, the parent has a copy of all those descriptors. When are the descriptors held by the parent closed? And even more so, which descriptors? Is it all of them? All of the ones left unused by the executing commands?

A file descriptor may be closed in one of 3 ways:

  1. You explicitly call close() on it.
  2. The process terminates, and the operating system automatically closes every file descriptor that was still open.
  3. When the process calls one of the seven exec() functions and the file descriptor has the O_CLOEXEC flag.

As you can see, most of the times, file descriptors will remain open until you manually close them. This is what happens in your code too - since you didn't specify O_CLOEXEC, file descriptors are not closed when the child process calls execvp(). In the child, they are closed after the child terminates. The same goes for the parent. If you want that to happen any time before terminating, you have to manually call close().

When handling pipes within the children, which descriptors are left open by which processes? Say if I execute the command: ls -l | grep "[username]", Which descriptors should be left open for the ls process? Just the write end of the pipe? And if so when? The same question applies to the grep command.

Here's a (rough) idea of what the shell does when you type ls -l | grep "username":

  1. The shell calls pipe() to create a new pipe. The pipe file descriptors are inherited by the children in the next step.
  2. The shell forks twice, let's call these processes c1 and c2. Let's assume c1 will run ls and c2 will run grep.
  3. In c1, the pipe's read channel is closed with close(), and then it calls dup2() with the pipe write channel and STDOUT_FILENO, so as to make writing to stdout equivalent to writing to the pipe. Then, one of the seven exec() functions is called to start executing ls. ls writes to stdout, but since we duplicated stdout to the pipe's write channel, ls will be writing to the pipe.
  4. In c2, the reverse happens: the pipe's write channel is closed, and then dup2() is called to make stdin point to the pipe's read channel. Then, one of the seven exec() functions is called to start executing grep. grep reads from stdin, but since we dup2()'d standard input to the pipe's read channel, grep will be reading from the pipe.

When I handle redirection of IO to a file, a new file must be opened and duped to STDOUT (I do not support input redirection). When does this descriptor get closed? I've seen in examples that it gets closed immediately after the call to dup2, but then how does anything get written to the file if the file has been closed?

So, when you call dup2(a, b), either one of these is true:

  • a == b. In this case, nothing happens and dup2() returns prematurely. No file descriptors are closed.
  • a != b. In this case, b is closed if necessary, and then b is made to refer to the same file table entry as a. The file table entry is a structure that contains the current file offset and file status flags; multiple file descriptors can point to the same file table entry, and that's exactly what happens when you duplicate a file descriptor. So, dup2(a, b) has the effect of making a and b share the same file table entry. As a consequence, writing to a or b will end up writing to the same file. So the file that is closed is b, not a. If you dup2(a, STDOUT_FILENO), you close stdout and you make stdout's file descriptor point to the same file table entry as a. Any program that writes to stdout will then be writing to the file instead, since stdout's file descriptor is pointing to the file you dupped.

UPDATE:

So, for your specific problem, here's what I have to say after briefly looking through the code:

You shouldn't be calling close(STDOUT_FILENO) in here:

if (command->getOutputFD() == REDIRECT) {
    if ((outfd = open(outfile, O_CREAT | O_WRONLY | O_TRUNC)) == -1)
        return false;
    if (dup2(outfd, STDOUT_FILENO) == -1)
        return false;
    close(STDOUT_FILENO);
}

If you close stdout, you will get an error in the future when you try to write to stdout. This is why you get ls: write error: Bad file descriptor. After all, ls is writing to stdout, but you closed it. Oops!

You're doing it backwards: you want to close outfd instead. You opened outfd so that you could redirect STDOUT_FILENO to outfd, once the redirection is done, you don't really need outfd anymore and you can close it. But you most definitely don't want to close stdout because the idea is to have stdout write to the file that was referenced by outfd.

So, go ahead and do that:

if (command->getOutputFD() == REDIRECT) {
    if ((outfd = open(outfile, O_CREAT | O_WRONLY | O_TRUNC)) == -1)
        return false;
    if (dup2(outfd, STDOUT_FILENO) == -1)
        return false;
    if (outfd != STDOUT_FILENO)
        close(outfd);
}

Note the final if is necessary: If outfd by any chance happens to be equal to STDOUT_FILENO, you don't want to close it for the reasons I just mentioned.

The same applies to the code inside else if (command->getOutputFD() == REDIRECTAPPEND): you want to close outfd rather than STDOUT_FILENO:

else if (command->getOutputFD() == REDIRECTAPPEND) {
    if ((outfd = open(outfile, O_CREAT | O_WRONLY | O_APPEND)) == -1)
        return false;
    if (dup2(outfd, STDOUT_FILENO) == -1)
        return false;
    if (outfd != STDOUT_FILENO)
        close(STDOUT_FILENO);
}

This should at least get you ls -l to work as expected.

As for the problem with the pipes: your pipe management is not really correct. It's not clear from the code you showed where and how pipefd is allocated, and how many pipes you create, but notice that:

  1. A process will never be able to read from a pipe and write to another pipe. For example, if outfile is not STDOUT and infile is not STDIN, you end up closing both the read and the write channels (and worse yet, after closing the read channel, you attempt to duplicate it). There is no way this will ever work.
  2. The parent process is closing every pipe before waiting for the termination of the children. This provokes a race condition.

I suggest redesigning the way you manage pipes. You can see an example of a working bare-bones shell working with pipes in this answer: https://stackoverflow.com/a/30415995/2793118