What would be the proper mental model helpful in understanding of bash redirection?

156 views Asked by At

What would be a consistent explanation of the mechanisms which result in a following bash session logs helpful in gaining understanding and making reliable prediction of outcomes for similar constructed commands possible:

~ $ cat f
abc123
~ $ cat g
cat: g: No such file or directory
~ $ cat f g 2>&1 >h
cat: g: No such file or directory
~ $ cat h
abc123

and

~ $ cmd >a 2>&1
~ $ cat a
Command 'cmd' not found, but there are 19 similar ones.
~ $ cmd 2>&1 >a
Command 'cmd' not found, but there are 19 similar ones.
~ $ cat a
~ $  

and

~ $ echo abc > a; echo bcd > b; echo cde > c
~ $ cat a
abc
~ $ cat b
bcd
~ $ cat c
cde
~ $ cmd >a>b>c 2>&1
~ $ cat a
~ $ cat b
~ $ cat c
Command 'cmd' not found, but there are 19 similar ones.

?

Have you succeeded in correctly predicting the outcome of the commands above in advance? If yes, how did you arrive at the right prediction?

2

There are 2 answers

16
KamilCuk On

What would be the proper mental model helpful in understanding of bash redirection?

As I read through the comments, I will guess you become confused at how file descriptors are not files, but really pointers to handles to files.

The "mental model" of file descriptors is really explained in many places, even on wiki https://en.wikipedia.org/wiki/File_descriptor . File descriptor is a pointer to filetable of inode table (like a double pointer). Consider researching what is a file descriptor and how it works.

cmd 2>&1 >a left to right ... cmd is executed and throws an error, the stderr is redirected to stdout and the standard out is redirected to file a, so the error message should be written to file a, but it doesn'

They are pointers. At 2>&1 stderr is redirect to where stdout is pointing. Then >a is opening file a, creating entry in filetable, and redirecting stdout to that entry in filetable. Stderr is still pointing to the entry in filetable stdout was pointing at the time 2>&1 was created.

where are the executable itself stdout and stderr pointing to?

The file descriptors are inherited from the parent process. Bash (or any parent process) file descriptors point to somewhere, when creating a child process all file descriptors are copied and point to the same somewhere.

Related: man dup2, man clone, man fork, man exec. I really understood what is fd after I understood how flock works.

Also, don't trust me! You can observe what a process is doing with strace.

$ LC_ALL=C strace -ff -e open,openat,clone,dup2,write bash -c 'cmd 2>&1 >/tmp/a'
openat(AT_FDCWD, "/etc/ld.so.preload", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/libreadline.so.8", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/libncursesw.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/dev/tty", O_RDWR|O_NONBLOCK) = 3
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7afe089c6e50) = 90612
strace: Process 90612 attached
[pid 90612] dup2(1, 2)                  = 2
[pid 90612] openat(AT_FDCWD, "/tmp/a", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
[pid 90612] dup2(3, 1)                  = 1
[pid 90612] write(2, "bash: line 1: cmd: command not f"..., 37bash: line 1: cmd: command not found
) = 37
[pid 90612] +++ exited with 127 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=90612, si_uid=1000, si_status=127, si_utime=0, si_stime=0} ---
+++ exited with 127 +++

May you explain why the outcome of dup2(1,2) followed by dup2(3, 1) is another one when I switch their order?

Let's try! Let's assume we have the following input state at some program initialization, after the process has started and opened a /tmp/a file with file descriptor 3/

enter image description here

  1. First case:

    1. Execute dup2(1, 2). Nothing happens. 2 already points to /dev/tty.

    2. Then execute dup2(3, 1). Below is the resulting connection. 2 still points to /dev/tty, but 1 was moved to /tmp/a.

      enter image description here

  2. Second case:

    1. Execute dup2(3, 1), so we move 1 arrow to where 3 is pointing. Below is the result:

      enter image description here

    2. Then execute dup2(1, 2), so we move 2 to where 1 is pointing to.

      enter image description here

As you see, the result is different. In the dup2(3, 1); dup2(1, 2) case, all 1 2 and 3 file descriptors point to /tmp/a. In contrast, in the dup2(1, 2); dup2(3, 1), the 2 is still pointing to /dev/tty.

The images above are missing the "file table" pointers, because I am too lazy to draw it. It would be just [0,1,2] -> readwrite -> /dev/tty and 3 -> read -> /tmp/a, for simplicity.

Let's do also this:

Have you succeeded in correctly predicting the outcome of the commands above in advance? If yes, how did you arrive at the right prediction?

~ $ cat f g 2>&1 >h

The 2>&1 will make stderr output on terminal. Then >h, so stdout goes to h. The content of file f will be in file h. The error message will be on terminal.

~ $ cmd >a 2>&1

The output of command cmd will go to a. Then stderr will also go to a. cmd does not exists. Bash will spawn a child process, then try to execute cmd. Bash will fail, so the Bash child temporary subshell will print an error message to file a.

~ $ cmd 2>&1 >a

As above in the 2>&1 >h case. Just the message does not come from cat, but from the temporary subshell Bash spawns to exec cmd.

~ $ cmd >a>b>c 2>&1

Is equal to cmd >a >b >c 2>&1. First, stdout goes to a. Then file descriptor that references file a is closed, and stdout goes to another file descriptor coming from opening file b. Then b is closed, and stdout goes to c. Then stderr also goes to c. The error message about cmd: command not found will be in c, as the temporary subshell will print it to stderr which points to c now. Because > open()s files with O_CREAT, there will be empty files a and b.


That was easy. Now me! Let's throw my examples. We have a function, that outputs stdout to stdout and stderr to stderr.

output() { echo stdout; echo stderr >&2; }

Then we take the magic time. time is a shell built-in. time outputs the timing message from the parent shell, not from the child subshell Bash spawns when executing the command. This is in contrast to the cmd: command not found message, which is outputted by the child subshell spawned to execute cmd. time message is printed by the parent shell after the child exits.

What is the output of the following commands? What output is outputted to stdout? And what output is on stderr?

  1. output 3>&2 1>&2 2>&3
  2. output 3>&1 1>&2 2>&3
  3. var=$(output 3>&1 1>&2 2>&3); echo "VAR: $var"
  4. var=$(output 3>&1 1>&2 2>&3) 2>/dev/null; echo "VAR: $var"
  5. { var=$(output 3>&1 1>&2 2>&3); } 2>/dev/null; echo "VAR: $var"
  6. time output >/dev/null 2>&1
  7. ( time output 2>&1 ) >/dev/null
  8. { time output 2>&1; } 2>&1 >/dev/null
  9. var=$( { time output 2>&1; } 3>&1 1>&2 2>&3 ); echo "TIME: $var"
  10. exec 3>&1; var=$( { time output >&3 2>&1; } 2>&1 ); echo "TIME: $var"
  11. var=$( { time output 2>&1; } 3>&1 1>&2 2>&3 ) 2>&1; echo "TIME: $var"
  12. { var=$( { time output 2>&1; } 3>&1 1>&2 2>&3 ); } 2>&1; echo "TIME: $var"
19
Claudio On

In the spirit of the oOo way of doing things (check out my profile for more) I suggest that you arrange for yourself a test environment like this one you can see in the image below. While running screenkey it consists of three kakoune text editor windows, a Terminal and the Thunar file manager window tracking the state of current directory of the shell session where kakoune instantly updates content of the by shell commands affected files. Such setting will help you to find out yourself how things work by creative usage of various commands while observing the details of the outcome in instantly their content updating windows.

testEnvironmentSetup

The image above captures a frame of a video which addresses the explanation how it comes that the files a and b are emptied and only file c is filled with content where the next command >a>b>c will empty the files a, b and c:

testEnvSettingFinalFrame


The trouble you face while trying to understand the first two cases posted in your question is rooted in the wrong assumption that redirection can be explained with a model of hosepipes and redirecting stderr to stdout means to let the stderr pipe pump its water into the stdout pipe.

Generally said it is sure not a wrong advice to refrain from believing that the online available tutorials and the pictures they provide on this subject are a true and correct explanation. It is a good right of free speech to put own weak understanding of the subject combined with bad choices of words to describe something online. It is YOU who need to approach the online available tutorials with appropriate amount of doubt about their correctness.

The widespread wording that the expression 2>&1 describes a redirection of stderr to stdout is not only misleading, but just not true. So how does it come that it stays a valid mainstream description? Hard to guess ... maybe because no one really cares? You may ask: 'if the mainstream description is not true, what is then true'? I don't know ... all what I can say is that stating it following way will be more correct:

*2>&1 is a statement involving data streams called stderr and stdout where a data stream is a concept involving a target for this data stream which will receive the data written to this stream. The target of the data stream can be considered a value of what is named stdout, stderr or using the numeric aliases of these names 1 and 2. The numeric aliases are what can lead to the impression that 1 is the value of stdout and 2 the value of stderr, but the actual value of stdout alias 1 and stderr alias 2 is actually a pointer to the recipient of the data.

There is no redirection of one stream into another stream ... what actually really happens is an assignment of a new target to a stream described in terms of a target used by another stream at the moment of assignment.

If you change your mental model to another one, like suggested in the comment to your question by jhnc, where redirection is putting the hosepipe into a basin and redirecting stderr to stdout means to put the stderr pipe into the basin where the stdout pipe was placed at the current moment at which the redirecting takes place while evaluating a command from left to the right it will allow you to predict the correct result in the first two cases.

The hosepipe/basin mental model does not provide the right prediction in the third case, so it needs some improvement to cover also this case. The improvement consists of becoming aware that putting a hosepipe into a basin using > ( instead of >> ) actually, like suggested in the comment by Renaud Pacalet , empties the basin if there is no water in the hosepipe. In other words a hosepipe is in case of a file as its target able to fill the "basin" with emptiness if there is no "water" in the pipe to put there.

What still maybe remains unexplained is why putting the hosepipe into the 'basin' of the terminal does not empty it in first place? It seems that the mental model of existence of a Terminal 'basin' need to be refined taking the shell into consideration.

The in the Wikipedia article articleImage provided image is not helpful here and an excellent example how you can badly screw your understanding if you consider the by the image provided information as correct.

Let's apply the suggested pipe/basin mental model to explain the outcome of the in the other answer suggested "riddles":

true ' stdin, stdout, stderr are file descriptor names: 
        /dev/stderr
        /dev/stdin
        /dev/stdout
which point (do they???) to the same ressources than
        /dev/fd/0
        /dev/fd/1
        /dev/fd/2
BUT ... each application gets own stdin, stdout and stderr specific to this application.        '

# Let’s define a function, that outputs  "stdout"  to stdout and  "stderr"  to stderr.
output() { printf 'stdout '   ;  printf 'stderr ' >&2  ; echo ;  }
˙ ' The function   output   streams  the word ’stdout’ to the pipe called ’1’ ( alias stdout ),
     the word ’stderr’ to the pipe called ’2’  ( alias stderr ) and the newline to  pipe    ’1’         '

˙  'What will be  the output of the following commands? '
echo '--1--'; output    3>&2    1>&2    2>&3    | grep --color std
echo '--2--'; output    3>&1    1>&2    2>&3    | grep --color std
echo '--3--'; output    3>&1    2>&1    1>&3    | grep --color std
# Let’s explain the most interesting output '--2--' using the pipe/basin mental model knowing that   grep --color  is highlighting only the stdout part of incoming input:
#       --2--
#       stdout      --> NOTICE: ’ std’ will appear in same default color as ’out’
#       stderr      --> NOTICE: ’ std’ will appear colored / highlighted
#echo '--2--'; output   3>&1    1>&2    2>&3    | grep --color std
#                   ^-- 3>&1 : a pipe ’3’ (existing or created) is put into the basin with  pipe 1 (alias stdout)  which is at that moment this part of the shell which awaits input from stdout. 
#                           ^-- 1>&2 : pipe called ’1’ (alias stdout) is put to the basin with pipe 2 i.e. targeting the stderr part of shell input.
#                                   ^-- 2>&3 : pipe ’2’ (alias stderr) is put where pipe 3 is at the current moment and this is the part of the shell which awaits input from stdout
#           ^-- the function streams the word  ’stdout’ via pipe ’1’ (alias stdout) to the stderr part of shell input to become  finally echoed in Terminal WITHOUT COLORING
#               and then streams the word ’stderr’ into via pipe ’2’ (alias stderr) to the part of the shell which awaits input from stdin echoed to Terminal WITH COLORING
#   The line feed between stdout and stderr is caused by streaming the newline via pipe 1 after the word  'stdout'  to stderr
echo '--4--'
 var=$(output 3>&1 1>&2 2>&3); echo "var: $var"
 # using a variable var and $(...) allows to separate the stdout and stderr parts of  output because only stdout part of the final output goes to the variable var 
 #  with help of pipe 3 the stdout and stderr are switched, so that the word 'stderr' arrives at the variable and the word 'stdout' is printed to the Terminal (i.e. stderr part of shell input) 
 #      with help of this construct the stdout and stderr parts can be distinguished like it was done using grep coloring
echo '--5--'
var=$(output 3>&1 1>&2 2>&3) 2>/dev/null; echo "var: $var"
# has the same effect than --4-- because the variable assignment is evaluated prior to 2>/dev/null  so that it has no effect here
echo '--6--'
{ var=$(output 3>&1 1>&2 2>&3); } 2>/dev/null; echo "VAR: $var"
# now 2>/dev/null is taken into the context of output related to expression within {...} and "swallows" the word 'stdout' so that it does not appear in the output
echo '--7-- '
time output >/dev/null 2>&1
# first the pipe 1 of stdout is put to the nowhere bassin, then pipe 2 of stderr to where pipe 1 is and this is nowhere. The time command is not affected by any redirections here.
echo '--8--'
( time output 2>&1 ) >/dev/null 
# even putting time into a group does not change anything about the time output because time outputs on stderr
echo '--9--'
{ time output 2>&1; } 2>&1 >/dev/null
# time output will stay there because stderr was put on shell stdout input  before stdout is put into nowhere
echo '--10--'
var=$( { time output 2>&1; } 3>&1 1>&2 2>&3 ); echo "TIME: $var"
# the variable is not set because the outcome of  output  is encapsulated and goes to shell first before it arrives as in the context of $(... )
echo '--11--'
exec 3>&1; var=$( { time output >&3 2>&1; } 2>&1 ); echo "TIME: $var"
# same as --10--  no of the stream assignments (redirections) has any effect on the outcome
echo ' --12-- '
var=$( { time output 2>&1; } 3>&1 1>&2 2>&3 ) 2>&1; echo "TIME: $var"
# same as --10--  no of the stream assignments (redirections) has any effect on the outcome
echo '--13--'
{ var=$( { time output 2>&1; } 3>&1 1>&2 2>&3 ); } 2>&1; echo "TIME: $var"
# same as --10--  no of the stream assignments (redirections) has any effect on the outcome
echo '==  =='

considering the comments in the script and the output:

~ $ ./_oo--shPrgLang--_O--pipeBasin-mentalModel-forUnderstanding-shellRedirection--_oo0.sh
--1--
stdout stderr 
--2--
stdout 
stderr 
--3--
stdout stderr 
--4--
stdout 
var: stderr 
--5--
stdout 
var: stderr 
--6--
VAR: stderr 
--7-- 

real    0m0.000s
user    0m0.000s
sys 0m0.000s
--8--

real    0m0.000s
user    0m0.000s
sys 0m0.000s
--9--

real    0m0.000s
user    0m0.000s
sys 0m0.000s
--10--
stdout stderr 
TIME: 
real    0m0.000s
user    0m0.000s
sys 0m0.000s
--11--
stdout stderr 
TIME: 
real    0m0.000s
user    0m0.000s
sys 0m0.000s
 --12-- 
stdout stderr 
TIME: 
real    0m0.000s
user    0m0.000s
sys 0m0.000s
--13--
stdout stderr 
TIME: 
real    0m0.000s
user    0m0.000s
sys 0m0.000s
==  ==

it appears that the pipe/basin mental model is suitable for use in explanation of effects of redirections if coupled with appropriate deep know-how related to further constructs available in shell scripting.