Storing the result of a shell command

494 views Asked by At

As you can read on the title I am interested on storing the result of a shell command and pass it to another rule.

Bellow are my rules:

SAMTOOLS = config["SAMTOOLS"]
rule useDepth:
  input:
     depth = "{individual}_{chr}.fixmate.sort.rgmdup.bam.depth"
  output:
     tmpVCF = "{individual}_{chr}.vcf"
  run:
     depth = storage.fetch("chrDepth")
     shell("echo {depth} | exit 1")

rule calDepth:
  input:
     bam = "{individual}.fixmate.sort.rgmdup.bam"
  output:
     temp("{individual}_{chr}.fixmate.sort.rgmdup.bam.depth")
  run:
     import subprocess,shlex
     depth=subprocess.check_output(shlex.split("{SAMTOOLS} depth -r {wildcards.chr} {input.bam} | awk '{{sum += $3}} END {{print sum / NR}}'"),shell=True)
     storage.store("chrDepth", depth)
     shell("echo \"Depth for {wildcards.chr} has been calculated\" > {output[0]}")

For sure I received an error here because of exit 1! But that just for testing.

The error which I am trying to solve is the value of {SAMTOOLS} in subprocess.check_output()!

depth: 1: depth: {SAMTOOLS}: not found
Error in job chrDepth while creating output file
RuleException:
Command '['{SAMTOOLS}', 'depth', '-r', '{wildcards.chr}', '{input.bam}', '|', 'awk', '{{sum += $3}} END {{print sum / NR}}']'

To be more informative, because diffrent user might install samtools in different place we make the address of samtools configurable through configfile. However, here I can't:

1) Read the correct value of {SAMTOOLS}!

2) Make the whole command runnable!

So, could you please tell me if there is any other way to store/pass the output of a rule to another rule!? More especificaly how can I enhance snakemake to tell shell that the {SAMTOOLS} is available.

Thanks!

1

There are 1 answers

1
TBoyarski On BEST ANSWER

This is you setting up the access to use as a Python variable.

SAMTOOLS = config["SAMTOOLS"]

But you try to access it here as, via {SAMTOOLS}, as a Snakemake rule specific wildcard:

depth=subprocess.check_output(shlex.split("{SAMTOOLS} depth -r {wildcards.chr} {input.bam} | awk '{{sum += $3}} END {{print sum / NR}}'"),shell=True)

Snakemake wildcards are not accessed the same way as Python variables. Furthermore, {SAMTOOLS} here is being accessed as a Snakemake wildcard, yet you do not use it as a wildcard in the rule's output's.

Assuming that {wildcards.chr} works, and that the {SAMTOOLS} call was the only wildcard not being found (not just the first unknown wildcard), I think you should try either of two things.

No pre-assignment:

depth=subprocess.check_output(shlex.split("config['SAMTOOLS'] depth -r {wildcards.chr} {input.bam} | awk '{{sum += $3}} END {{print sum / NR}}'"),shell=True)

Access it as the python variable as a string (it's an object representing a string):

depth=subprocess.check_output(shlex.split(SAMTOOLS + " depth -r {wildcards.chr} {input.bam} | awk '{{sum += $3}} END {{print sum / NR}}'"),shell=True)

Lastly, and least recommended due to the rule-rule coupling it introduces, there are ways to pass variables across rules in Snakemake, and you already are using it, however, I do not think that is what is needed here. Proper access and design should be enough as is.

Snakemake Tutorial FAQ: How to pass variables between rules

Side note

To eliminate the passed of char depth across rules, and also to save it as path of the file name, and decouple the rules, I highly recommend converting chrDepth to an naming wildcard...

Something like ...

rule useDepth:
  input:
     depth = "{individual}_{chr}_of_{chrDepth}.fixmate.sort.rgmdup.bam.depth"
  output:
     tmpVCF = "{individual}_{chr}_of{chrDepth}.vcf"

But I'm not sure how you are calculating chrDepth. It concerns me that you are passing it between all these rules, and not just relying on good naming conventions. It could needlessly couple your code, causing issues and overhead downstream.