Setting the SGE cluster job name with Snakemake while using DRMAA?

699 views Asked by At

Problem

I'm not sure if the -N argument is being saved. SGE Cluster. Everything works except for the -N argument.

  • Snakemake requires a valid -N call
  • It doesn't set the job name properly.

It always reverts to the default name. This is my call, which has the same results, with or without the -N argument.

snakemake --jobs 100 --drmaa "-V -S /bin/bash -o log/mpileup/mpileupSPLIT -e log/mpileup/mpileupSPLIT -l h_vmem=10G -pe ncpus 1 -N {rule}.{wildcards}.varScan"

The only way I have found to influence the job name is to use --jobname.

snakemake --jobs 100 --drmaa "-V -S /bin/bash -o log/mpileup/mpileupSPLIT -e log/mpileup/mpileupSPLIT -l h_vmem=10G -pe ncpus 1 -N {rule}.{wildcards}.varScan" --jobname "{rule}.{wildcards}.{jobid}"

Background

I've tried a variety of things. Usually I actually just use a cluster configuration file, but that isn't working either, so that's why in the code above, I ditched the file system to make sure it's the '-N' command which isn't being saved.

My usual call is:

snakemake --drmaa "{cluster.clusterSpec}" --jobs 10 --cluster-config input/config.json 

1) If I use '-n' instead of '-N', I receive a workflow error:

drmaa.errors.DeniedByDrmException: code 17: ERROR! invalid option argument "-n"

2) If I use '-N', but give it an incorrect wildcard, say {rule.name}:

AttributeError: 'str' object has no attribute 'name'

3) I cannot use both --drmaa AND --cluster:

 snakemake: error: argument --cluster/-c: not allowed with argument --drmaa

4) If I specify the {jobid} in the config.json file, then Snakemake doesn't know what to do with it.

RuleException in line 13 of /extscratch/clc/projects/tboyarski/gitRepo-LCR-BCCRC/Snakemake/modules/mpileup/mpileupSPLIT:
NameError: The name 'jobid' is unknown in this context. Please make sure that you defined that variable. Also note that braces not used for variable access have to be escaped by repeating them, i.e. {{print $1}}

EDIT Added #5 w/ Solution

5) I can set the job name using the config.json and just concatenate the jobid on afterwards in my snakemake call. That way I have a generic snakemake call (--jobname "{cluster.jobName}.{jobid}"), and a highly configurable and specific job name ({rule}-{wildcards.sampleMPUS}_chr{wildcards.chrMPUS}) which results in:

mpileupSPLIT-Pfeiffer_chr19.1.e7152298

The 1 is the Snakemake jobid according to the DAG. The 7152298 is my cluster's job number.

2nd EDIT - Just tried v3.12, same thing. Concatenation must occur in snakemake call.

Alternative solution

I would also be okay with something like this:

snakemake --drmaa "{cluster.clusterSpec}" --jobname "{cluster.jobName}" --jobs 10 --cluster-config input/config.json

With my cluster file like this:

"mpileupSPLIT": {
    "clusterSpec": "-V -S /bin/bash -o log/mpileup/mpileupSPLIT -e log/mpileup/mpileupSPLIT -l h_vmem=10G -pe ncpus 1 -n {rule}.{wildcards}.varScan",
    "jobName": "{rule}-{wildcards.sampleMPUS}_chr{wildcards.chrMPUS}.{jobid}"
}

Documentation Reviewed

I've read the documentation but I was unable to figure it out.

  1. http://snakemake.readthedocs.io/en/latest/executable.html?-highlight=job_name#cluster-execution

  2. http://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#snakefiles-cluster-configuration

  3. https://groups.google.com/forum/#!topic/snakemake/whwYODy_I74

System

Snakemake v3.10.2 (Will try newest conda version tomorrow) Red Hat Enterprise Linux Server release 5.4 SGE Cluster

1

There are 1 answers

0
TBoyarski On BEST ANSWER

Solution

Use '--jobname' in your snakemake call instead of '-N' in your qsub parameter submission

Setup your cluster config file to have a targetable parameter for the jobname suffix. In this case these are the overrides for my Snakemake rule named "mpileupSPLIT":

"mpileupSPLIT": {
  "clusterSpec": "-V -S /bin/bash -o log/mpileup/mpileupSPLIT -e log/mpileup/mpileupSPLIT -l h_vmem=10G -pe ncpus 1",
  "jobName": "{rule}-{wildcards.sampleMPUS}_chr{wildcards.chrMPUS}"
}

Utilize a generic Snakemake call which includes {jobid}. On a cluster (SGE), the 'jobid' variable contains both the Snakemake Job# and the Cluster Job#, both are valuable as the first corresponds to the Snakemake DAG and the later is for cluster logging. (E.g. --jobname "{cluster.jobName}.{jobid}")

EDIT Added solution to resolve post.