Running multiple exec commands and waiting to finish before continuing

1.4k views Asked by At

I know I ask a lot of questions and I know there is a lot on here about this exactly what I am trying to do but I have not been able to get it to work in my script nor figure out why it does not let me do this. I am trying to run several commands using exec in the background and the tests can range anywhere between 5 and 45 minutes (longer if they have to cue for a license). It takes forever to run them back to back so I was wondering what I need to do to make my script wait for them to finish before moving on the the next section of script.

while {$cnt <= $len} {
# Begin count for running tests
set testvar [lindex $f $cnt]
if {[file exists $path0/$testvar] == 1} {
    cd $testvar
} else {
    exec mkdir $testvar
    cd $testvar
    exec create_symobic_link_here
}
# Set up test environment
exec -ignorestderr make clean
exec -ignorestderr make depends
puts "Running $testvar"
set runtest [eval exec -ignorestderr bsub -I -q lin_i make $testvar SEED=1 VPDDUMP=on |tail -n 1 >> $path0/runtestfile &]
cd ../
incr cnt
}

I know there is nothing here to make the script wait for the process to finish but I have tried many different things any this is the only way I can get it to run everything. It just doesn't wait.

1

There are 1 answers

8
Brad Lanam On BEST ANSWER

One way is to modify your tests to create a "finished" file. This file should be created whether the test completes correctly or fails.

Modify the startup loop to remove this file before starting the test:

 catch { file delete $path0/$testvar/finished }

Then create a second loop:

while { true } {
  after 60000
  set cnt 1 ; # ?
  set result 0
  while { $cnt <= $len } {
    set testvar [lindex $f $cnt]
    if { [file exists $path0/$testvar/finished] } {
      incr result
    }
    incr cnt
  }
  if { $result == $len } {
    break
  }
}

This loop as written will never exit if any one test doesn't create the 'finished' file. So I would add in an additional stop condition (no more than one hour) to exit the loop.

Another way would be to save the process ids of the background processes in a list and then the second loop would check each process id to see if it is still running. This method would not require any modifications to the test, but is a little harder to implement (not too hard on unix/mac, harder on windows).

Edit: loop using process id check:

To use process ids, the main loop needs to be modified to save the process ids of the background jobs:

Before the main loop starts, clear the process id list:

set pidlist {}

In the main loop, save the process ids from the exec command (In tcl, [exec ... &] returns the background process id):

lappend pidlist $runtest ; # goes after the exec bsub...

A procedure to check for the existence of a process (for unix/mac). Tcl/Tk does not have any process control commands, so the unix 'kill' command is used. 'kill -0' on unix only checks for process existence, and does not affect the execution of the process.

# return 0 if the process does not exist, 1 if it does
proc checkpid { ppid } {
  set pexists [catch {exec kill -0 $ppid}]
  return [expr {1-$pexists}]
}

And the second loop to check to see if the tests are done becomes:

set tottime 0
while { true } {
  after 60000
  incr tottime 1 ; # in minutes
  set result 0
  foreach {pid} $pidlist {
    if { ! [checkpid $pid] } {
      incr result
    }
  }
  if { $result == $len } {
    break
  }
  if { $tottime > 120 } {
    puts "Total test time exceeded."
    break ; # or exit 
  }
}

If a test process gets hung and never exits, this loop will never exit, so a second stop condition on total time is used.