I am using drmaa-python to submit and monitor jobs to and from SGE (Sun Grid Engine). I have following status in GUI
- Jobs active in Queue
- Running Jobs
- Completed Jobs
- Failed Jobs
- Status Undetermined
Sometimes I get few jobs with Status Undetermined. When I check jobs status using qstat in terminal of SGE host machine, I can see all jobs running without failing a single one. Status Undetermined is often misleading to users, because the user might think Status Undetermined jobs are having some sort of problems.
So, I understand that the problem is not with SGE, but with drmaa-python library.
Does anyone one know why drmaa-python can not determine the status?
drmaa-python is just a thin ctype wrapper around the DRMAA C library, without any dedicated logic. For this reason, the UNDETERMINED status you see is what the SGE DRMAA C library delivers in the moment where you ask.
From what I know, it can happen from time to time that the DRMAA C library for SGE cannot fetch the status. I would recommend to add some custom retry logic in your python application to simply re-ask the scheduler.