Running multiple snakemake rules

771 views Asked by At

I would like to run multiple rules one after another using snakemake. However, when I run this script, the bam_list rule appears before samtools_markdup rule, and gives me an error that it cannot find input files, which are obviously have not been generated yet. How to solve this problem?

rule all:
    input: 
        expand("dup/{sample}.dup.bam", sample=SAMPLES)
        "dup/bam_list"

rule samtools_markdup:
    input:
        sortbam ="rg/{sample}.rg.bam"
    output:
        dupbam = "dup/{sample}.dup.bam"
    threads: 5
    shell:
        """
        samtools markdup -@ {threads} {input.sortbam} {output.dupbam}
        """

rule bam_list:
    output:
         outlist = "dup/bam_list"
    shell:
         """
         ls dup/*.bam > {output.outlist}
         """
1

There are 1 answers

0
Troy Comi On BEST ANSWER

Snakemake is following directions, you want dup/bam_list and it can be produced without any inputs. I think what you mean to have is:

rule all:
    input: 
        "dup/bam_list"

rule samtools_markdup:
    input:
        sortbam ="rg/{sample}.rg.bam"
    output:
        dupbam = "dup/{sample}.dup.bam"
    threads: 5
    shell:
        """
        samtools markdup -@ {threads} {input.sortbam} {output.dupbam}
        """

rule bam_list:
    input: 
        expand("dup/{sample}.dup.bam", sample=SAMPLES)
    output:
        outlist = "dup/bam_list"
    shell:
         """
         ls dup/*.bam > {output.outlist}
         """

Now bam_list will wait until all the samtools_markdup jobs are completed. As an aside, I expect the contents of dup_list to be identical to expand("dup/{sample}.dup.bam", sample=SAMPLES), so if you use the file later in the workflow you can probably just use the expand output.