There are some commands I'd like to run on a grid using qsub (SGE 8.1.3, CentOS 5.9) that need to use a pipe (|
) or a redirect (>
). For example, let's say I have to parallelize the command
echo 'hello world' > hello.txt
(Obviously a simplified example: in reality I might need to redirect the output of a program like bowtie directly to samtools). If I did:
qsub echo 'hello world' > hello.txt
the resulting content of hello.txt
would look like
Your job 123454321 ("echo") has been submitted
Similarly if I used a pipe (echo "hello world" | myprogram
), that message is all that would be passed to myprogram
, not the actual stdout.
I'm aware I could write a small bash script that each contain the command with the pipe/redirect, and then do qsub ./myscript.sh
. However, I'm trying to run many parallelized jobs at the same time using a script, so I'd have to write many such bash scripts each with a slightly different command. When scripting this solution can start to feel very hackish. An example of such a script in Python:
for i, (infile1, infile2, outfile) in enumerate(files):
command = ("bowtie -S %s %s | " +
"samtools view -bS - > %s\n") % (infile1, infile2, outfile)
script = "job" + str(counter) + ".sh"
open(script, "w").write(command)
os.system("chmod 755 %s" % script)
os.system("qsub -cwd ./%s" % script)
This is frustrating for a few reasons, among them that my program can't even delete the many jobXX.sh
scripts afterwards to clean up after itself, since I don't know how long the job will be waiting in the queue, and the script has to be there when the job starts.
Is there a way to provide my full echo 'hello world' > hello.txt
command to qsub without having to create another file containing the command?
You can do this by turning it into a
bash -c
command, which lets you put the|
in a quoted statement:As @spuder has noted in the comments, it seems that in other versions of qsub (not SGE 8.1.3, which I'm using), one can solve the problem with:
as well.