LSF job states for a given job

420 views Asked by At

Let's say my job was running for some time and it went to suspend state due to machine overloading and became running after sometime and got completed. Now the status acquired by this job were RUNNING -> SUSPEND -> RUNNING

How to get all states acquired by a given job ?

1

There are 1 answers

3
Michael Closson On

bjobs -l If the job hasn't been cleaned from the system yet.

bhist -l Otherwise. You might need -n, depending on how old the job is.

Here's an example of bhist -l output when a job was suspended and later resumed because the system load temporarily exceeded the configured threshold.

$ bhist -l 1168

Job <1168>, User <mclosson>, Project <default>, Command <sleep 10000>
Fri Jan 20 15:08:40: Submitted from host <hostA>, to 
                 Queue <normal>, CWD <$HOME>, Specified Hosts <hostA>;
Fri Jan 20 15:08:41: Dispatched 1 Task(s) on Host(s) <hostA>, Allocated 1 Slot(
                 s) on Host(s) <hostA>, Effective RES_REQ <select[type == any] or
                 der[r15s:pg] >;
Fri Jan 20 15:08:41: Starting (Pid 30234);
Fri Jan 20 15:08:41: Running with execution home </home/mclosson>, Execution CW
                 D </home/mclosson>, Execution Pid <30234>;
Fri Jan 20 16:19:22: Suspended:  Host load exceeded threshold:  1-minute CPU ru
                 n queue length (r1m)
Fri Jan 20 16:21:43: Running;

Summary of time in seconds spent in various states by  Fri Jan 20 16:22:09
  PEND     PSUSP    RUN      USUSP    SSUSP    UNKWN    TOTAL
  1        0        4267     0        141      0        4409        

At 16:19:22 the jobs was suspended because r1m exceeded the threshold. Later at 16:21:43 the job resumes.