I am using drmaa-python to submit and monitor jobs to and from SGE (Sun Grid Engine). I have following status in GUI

  • Jobs active in Queue
  • Running Jobs
  • Completed Jobs
  • Failed Jobs
  • Status Undetermined

Sometimes I get few jobs with Status Undetermined. When I check jobs status using qstat in terminal of SGE host machine, I can see all jobs running without failing a single one. Status Undetermined is often misleading to users, because the user might think Status Undetermined jobs are having some sort of problems.

So, I understand that the problem is not with SGE, but with drmaa-python library.

Does anyone one know why drmaa-python can not determine the status?

1

There are 1 answers

0
Peter Tröger On

drmaa-python is just a thin ctype wrapper around the DRMAA C library, without any dedicated logic. For this reason, the UNDETERMINED status you see is what the SGE DRMAA C library delivers in the moment where you ask.

From what I know, it can happen from time to time that the DRMAA C library for SGE cannot fetch the status. I would recommend to add some custom retry logic in your python application to simply re-ask the scheduler.