I'm working on an SGE cluster and having some problems with the qsub email notification system. All of my jobs work perfectly, but I seem unable to modify the default behaviour to only notify at an aborted job. The -M flag works correctly, and I do receive an email when the job is aborted, however I would like to get an email when a job begins, ends, is aborted, or suspended. I am using the following flags (and more) in my scripts, is there something stupid that I am missing?
#!/bin/bash
#$ -S /bin/bash
#$ -M email@server
#$ -m beas
program
It also does not work when I try the following:
qsub -M email@server -m baes script.sh
Is this an issue that I should take up with my cluster sys admins, or have I done something incorrectly?
Thanks for your help.
The important thing to understand in solving this problem is that your job status email will be sent by the node where the job runs. For example, I have a test job with the following output:
Now, run the job, and see where it ran.
If you look at the mail logs on the system, you'll see the delivery attempts made. You'll have to diagnose from there. Here are a few examples for failures (or even successes that aren't successful in the way you want them to be):
Sent to the compute node address, using
-M pkenyon
Head node MX not set up right, using
-M [email protected]
You need to set up your system to use a local mail relay if using
-M [email protected]
So yes, you need to talk to your cluster sysadmins, but these are the first steps to figuring out where your SGE emails are hanging up. With a little more information, your admins will be able to fix the configuration issue and help you get more out of your cluster environment.