Output to the same file sequence

102 views Asked by At

Suppose we have myScript.sh as below:

#!/bin/bash
do something with $1 > bla.txt
do something with bla.txt > temp.txt
...
cat temp.txt >> FinalOuput.txt

Then we run parallel as below:

parallel myScript.sh {} ::: {1..3}

Does it write output in order? Will FinalOutput.txt have results of 1 first, then 2, and then 3.

Note: I am currently outputting to separate files then merging them in required order once parallel is complete, just wondering if I could avoid this step.

2

There are 2 answers

1
larsks On BEST ANSWER

The processes are run in parallel. Not only is there no guarantee that they will finish in order, there's not even a guarantee that you can have multiple processes writing to the same file like that and end up with anything useful.

If you are going to be writing to the same file from multiple processes, you should implement some sort of locking to prevent corruption. For example:

while ! mkdir FinalOutput.lock; do
    sleep 1
done

cat temp.txt >> FinalOutput.txt
rmdir FinalOutput.lock

If order matters, you should each script write to a unique file, and then assemble the final output in the correct order after all your parallel jobs have finished.

#!/bin/bash
do something with $1 > bla.txt
do something with bla.txt > temp-$1.txt
...
cat temp.txt >> FinalOuput.txt

And then after parallel has finished:

cat temp-*.txt > FinalOutput.txt
0
Ole Tange On

The ideal way is to avoid tempfiles all together. That can often be done by using pipes:

parallel 'do something {} | do more | something else' ::: * > FinalOutput

But if that is impossible then use tmpfiles that depends on {#} which is the job sequence number in GNU Parallel:

doer() {
  do something $1 > $2.bla
  do more $2.bla > $2.tmp
  something else $2.tmp
}
export -f doer
parallel doer {} {#} ::: * > FinalOutput