loop through different arguments in Rscript within Korn shell

1.2k views Asked by At

I have an R script which I'm running in the terminal by firstly generating a .ksh file called myscript.ksh with the following information:

#!/bin/ksh

Rscript myscript.R 'Input1'

and then run the function with

./mycode.ksh

which sends the script to a node on the cluster in our department (the processes that we send to the cluster must be as a .ksh file).

'Input1' is an input argument that is used by the R script to some analysis.

The issue that I now have is that I need to run this script a number of times with different input arguments to the function. One solution is to generate a few .ksh files, such as:

#!/bin/ksh

Rscript myscript.R 'Input2'

and

#!/bin/ksh

Rscript myscript.R 'Input3'

and then execute them seperately, but I was hoping to find a better solution.

Note that I have to do this for 100 different input arguments so it is not realistic to write 100 of these files. Is there a way of generating another file with the information needed to be supplied to the function e.g. 'Input1' 'Input2' 'Input3' and then run myscript.ksh for these individually.

For example, I could have a variable defining the name of the input arguments and then have a loop which would pass it to myscript.ksh. Is that possible?

The reason for running these in this manner is so that each iteration will hopefully be send to a different node on the cluster, thus analysing the data at a much faster rate.

2

There are 2 answers

2
Mike Ryan On BEST ANSWER

You need to do two things:

  1. Create an array of all your input variables
  2. Loop through the array and initiate all your calls

The following illustrates the concept:

#!/bin/ksh

#Create array of inputs - space separator 
inputs=(Input1 Input2 Input3 Input4)

# Loop through all the array items {0 ... n-1}
for i in {0..3}
do
   echo ${inputs[i]}
done

This will output all the values in the inputs array.

You just need to replace the contents of the do-loop with:

Rscript myscript.R ${inputs[i]}

Also, you may need to add a ` &' at the end of the Rscript command line to spawn off each Rscript command as a separate thread -- otherwise, the shell would wait for a return from each Rscript command before going onto the next.


EDIT:

Based on your comments, you need to actually generate .ksh scripts to submit to qsub. For this you just need to expand the do loop.

For example:

#!/bin/ksh

#Create array of inputs - space separator 
inputs=(Input1 Input2 Input3 Input4)

# Loop through all the array items {0 ... n-1}
for i in {0..3}
do
   cat > submission.ksh << EOF
       #!/bin/ksh

       Rscript myscript.R ${inputs[i]}
EOF

   chmod u+x submission.ksh

   qsub submission.ksh
done

The EOF defines the beginning and end of what will be taken as input (STDIN) and the output (STDOUT) will written to submission.ksh.

Then submission.ksh is made executable with the chmod command.

And then the script is submitted via qsub. I'll let you fill in any other arguments you need for qsub.

0
Walter A On

When your script doesn't know all parameters when it starts, you can make a .ksh file called mycode.ksh with the following information:

#!/bin/ksh

if [ $# -ne 1 ]; then
   echo "Usage: $0 input"
   exit 1
fi
# Or start at the background with nohup .... &, other question
Rscript myscript.R "$1"

and then run the function with ./mycode.ksh inputX

When your application knows all arguments, you can use a loop:

#!/bin/ksh
if [ $# -eq 0 ]; then
   echo "Usage: $0 input(s)"
   exit 1
fi
for input in $*; do
   Rscript myscript.R "${input}"
done

and then run the function with

./mycode.ksh input1 input2 "input with space in double quotes" input4