How to use Snakemake "allow_missing"=True properly? Partial wild_cards?

339 views Asked by At

I have a list of input files that are in different subfolders and each folder have different number of files, with two wildcards SAMPLE and id. For the output, these names will also be present:

SAMPLE=set(["x","y","z"])

with open(config["path"]+"barcodes.txt") as f: id = [line.rstrip() for line in f]

rule all:
    input:
        expand(config["path"]+ "{sample}/remap/filtered.{sample}.R1.clean.id_{ID}.fq.bam", sample=SAMPLE, ID=id, allow_missing=True)



rule map_again:
    output:
        config["path"]+ "{sample}/remap/filtered.{sample}.R1.clean.id_{ID}.fq.bam"
    input:
        expand(config["path"]+ "{{sample}}/map/filtered.{{sample}}.R1.clean.id_{ID}.fq.gz", sample=SAMPLE, allow_missing=True)
    shell:
        "squire Map -1 {input} -r 150 -p 10 "

However, I still got warnings from Snakemake that certain combination of the wildcards don't exist, although I hoped it to ignore these ones...

How could I correct this?

Thank you very much!

1

There are 1 answers

4
SultanOrazbayev On

The current version of rule all contains redundant kwarg allow_missing:

rule all:
    input:
        expand(config["path"]+ "{sample}/remap/filtered.{sample}.R1.clean.id_{ID}.fq.bam", sample=SAMPLE, ID=id)

This is because allow_missing is just a convenience kwarg that allows providing a partial list of wildcards to the expansion. This means that the result of the expansion with this kwarg will contains the missing wildcards. Example borrowed from this answer:

expand("text_{letter}_{num}.txt", num=[1, 2], allow_missing=True)
# ["text_{letter}_1.txt", "text_{letter}_2.txt"]

From the statement of the question, it seems that you would like to ignore missing combinations of files. One way to achieve this is to define a custom function and provide it as input. For example:

def find_available_files(wildcards):
   from glob import glob
   path = config["path"]+ "{sample}/map/filtered.{sample}.R1.clean.id_{ID}.fq.gz"
   files = glob(path.format(sample=wildcards.sample, ID="*")
   return files

rule map_again:
    output:
        config["path"]+ "{sample}/remap/filtered.{sample}.R1.clean.id_{ID}.fq.bam"
    input:
        find_available_files
    shell:
        "squire Map -1 {input} -r 150 -p 10 "