I have a gzipped file that I've split into 3 separate files: xaa, xab, xac. I make a fifo
mkfifo p1
and reassemble the files by reading from it, also calculating a checksum and unzipping the file in a pipe:
cat p1 p1 p1 | tee >(sha1sum > sha1sum_new.txt) | gunzip > output_file.txt
This works just fine if I feed the pipe from another terminal with
cat xaa > p1
cat xab > p1
cat xac > p1
but if I feed the pipe with a single line,
cat xaa > p1; cat xab > p1; cat xac > p1
the receiving pipeline hangs, no checksum is produced, and although an output file is produced, it is truncated - but by an amount smaller than the final file size.
Why is the behavior in the second case different from the first?
Interesting question. As the other answer mentions, you have a race condition - and I am pretty sure of that. In fact, you have a race condition in both cases, but in the former you're just lucky it doesn't happen because maybe your files are small and can be read before you enter the next command line. Allow me to explain.
So, a little bit of background first:
cat
opens each file you feed it as an argument sequentially, prints it to the output, and then closes the file and moves on to the next file. The exact details of whethercat
opens each file sequentially or opens them all first and then writes each file sequentially may vary, but it's not relevant for the discussion. In both cases, you'll have a race conditionopen(2)
syscall will block on a FIFO / pipe until the other end is opened. So for example, if processpid1
opens the FIFO for reading,open(2)
will block until, say,pid2
opens the FIFO for writing. In other words, opening a FIFO that has no active readers or writers implicitly synchronizes both processes and guarantees that a process will not read from a pipe that has no writer yet, or that a writer will not write to a pipe that has no reader yet. But as we will see, this will be problematic.What's really happening
When you do this:
Things are really slow, because humans are slow. After you enter the first line,
cat
opensp1
for writing. The othercat
is blocked on opening it for reading (or maybe not yet, but let's assume it is). Once bothcat
processes openp1
- one for writing, the other for reading - data starts to flow.And then, before you even have the chance to enter the next command line (
cat xab >p1
), the whole file flows through the pipe and everyone is happy - thecat
reader process sees an end of file on the pipe, callsclose(2)
, thecat
writer finishes writing the file, and closesp1
. Thecat
reader moves on to the next file (which isp1
again), opens it, and blocks because no active writers have opened the fifo yet.Then, you, slow human, enter the next command line, which causes another
cat
writer process to open the FIFO, which unblocks the othercat
that is waiting to open for reading, and everything happens again. And then again for the third command line.When you put everything in one line in the shell, things happen way too fast.
Let's differentiate the 3
cat
invocations. Call itcat1
,cat2
andcat3
:The shell executes each command sequentially, waiting for the previous command to finish before moving to the next one.
However, it might just be the case that
cat1
finished writing everything top1
and exits, the shell moves on tocat2
, which opens the FIFO and starts writing the contents ofp1
again, and thecat
reader didn't have the chance to finish reading whatcat1
wrote in the first place, and now suddenly thecat
reader "thinks" it's still reading from the first file (the firstp1
), but at some point it starts reading the data thatcat2
started pushing into the pipe (as if it was in the firstp1
). It has no way of knowing that the first "copy" of the data is over ifcat2
is faster and opens the FIFO before thecat
reader finishes reading whatcat1
wrote.Yes, subtle, but it's exactly what is happening.
Then, of course, input eventually comes to an end, and the
cat
reader will think that the firstp1
is done and moves to the nextp1
, opening it and waiting for the next writer to open it. But there will never be a next writer! It blocks forever, and the whole pipeline is stalled forever.How to fix it
The solution in the other answer solves the problem. You mentioned in the comments that it might not be enough for you because you don't control when and how a new writer opens and uses the pipe.
So I suggest this instead:
cat
standard input top1
in the background:cat >p1 &
. When you're done, kill the background job.cat p1 | tee >(sha1sum ...)
or using the method proposed in the other answer (tee >(...) <p1
). After all, opening a FIFO once should be enough no matter how complex your system is; FIFOs by nature always give you the data in a first in first out fashion.Keep the background
cat
writer running as long as you know that there's a chance of new files arriving / new writers opening the FIFO and using it. Don't forget to terminate the background job when you know that input is over.