I am new to snakemake and would like to be able to take either a pair of .fq
files or a pair of .fq.gz
files and run them through trim_galore
to get a pair of trimmed .fq.gz
output files. Without giving all of my Snakefile, I have the below ugly solution where I just copied the rule and changed the inputs. What would be a better solution?
#Trim galore paired end trimming rule for unzipped fastqs:
rule trim_galore_unzipped_PE:
input:
r1=join(config['fq_in_path'], '{sample}1.fq'),
r2=join(config['fq_in_path'], '{sample}2.fq'),
output:
r1=join(config['trim_out_path'], '{sample}1_val_1.fq.gz'),
r2=join(config['trim_out_path'], '{sample}2_val_2.fq.gz'),
params:
out_path=config['trim_out_path'],
conda:
'envs/biotools.yaml',
shell:
'trim_galore --gzip -o {params.out_path} --paired {input.r1} {input.r2}'
#Trim galore paired end trimming rule for gzipped fastqs:
rule trim_galore_zipped_PE:
input:
r1=join(config['fq_in_path'], '{sample}1.fq.gz'),
r2=join(config['fq_in_path'], '{sample}2.fq.gz'),
output:
r1=join(config['trim_out_path'], '{sample}1_val_1.fq.gz'),
r2=join(config['trim_out_path'], '{sample}2_val_2.fq.gz'),
params:
out_path=config['trim_out_path'],
conda:
'envs/biotools.yaml',
shell:
'trim_galore --gzip -o {params.out_path} --paired {input.r1} {input.r2}'
Using input functions is likely the best solution, being as follows:
Notes:
Snakefile:
config.yaml:
$tree:
$snakemake -dry (input: fg)
$snakemake -dry (input: fgqz)
There are ways to make it more generic, but since you declare and use the YAML config to build most of the file name, I will avoid discussing it in the answer. Just saying it's possible and somewhat encouraged.
The "--paired {input}" will expand to provide both files. Because of the for-loop, the 1 will always come before the 2.