How can python wait for a batch SGE script finish execution?

1.3k views Asked by At

I have a problem I'd like you to help me to solve.

I am working in Python and I want to do the following:

  • call an SGE batch script on a server
  • see if it works correctly
  • do something

What I do now is approx the following:

    import subprocess
    try:
       tmp = subprocess.call(qsub ....)
       if tmp != 0:
           error_handler_1()
       else:
           correct_routine()
    except:
       error_handler_2()

My problem is that once the script is sent to SGE, my python script interpret it as a success and keeps working as if it finished.

Do you have any suggestion about how could I make the python code wait for the actual processing result of the SGE script ?

Ah, btw I tried using qrsh but I don't have permission to use it on the SGE

Thanks!

2

There are 2 answers

1
Vince On BEST ANSWER

From your code you want the program to wait for job to finish and return code, right? If so, the qsub sync option is likely what you want:

http://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html

1
mattia b On

Additional Answer for an easier processing: By using the python drmaa module : link which allows a more complete processing with SGE. A functioning code provided in the documentation is here: [provided you put a sleeper.sh script in the same directory] please notice that the -b n option is needed to execute a .sh script, otherwise it expects a binary by default like explained here

import drmaa
import os

def main():
   """Submit a job.
   Note, need file called sleeper.sh in current directory.
   """
   s = drmaa.Session()
   s.initialize()
   print 'Creating job template'
   jt = s.createJobTemplate()
   jt.remoteCommand = os.getcwd()+'/sleeper.sh'
   jt.args = ['42','Simon says:']
   jt.joinFiles=False
   jt.nativeSpecification  ="-m abe -M mymail -q so-el6 -b n"
   jobid = s.runJob(jt)
   print 'Your job has been submitted with id ' + jobid
   retval = s.wait(jobid, drmaa.Session.TIMEOUT_WAIT_FOREVER)
   print('Job: {0} finished with status {1}'.format(retval.jobId, retval.hasExited))
   print 'Cleaning up'
   s.deleteJobTemplate(jt)
   s.exit()

if __name__=='__main__':
    main()