I have two nextflow scripts that work independently, how do I combine them into one such that output files from process TRIM can be used as input files for process BWA. The process BWA must not start until paired trimmed files have been created.
Both scripts use paired files (for each mate of paired end sequencing), so not sure how to use Channel.fromFilePairs for chaining the two scripts into one.
nextflow script 1 this script trims reads and stores the output in trimmed/
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
process TRIM {
debug true
publishDir 'trimmed/', mode: 'copy', overwrite: false, pattern: "*"
input:
tuple val(sampleId), file(reads)
output:
path "*"
script:
"""
trim_galore --paired -q 30 --length 30 --fastqc --phred33 $reads
"""
}
workflow {
Channel.fromFilePairs("$baseDir/subset_fq/*_{1,2}.fq.gz", checkIfExists:true) \
| TRIM
}
nextflow script 2 this script uses trimmed reads from trimmed/ and outputs sam files in bwa_aligned/
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
process BWA {
debug true
publishDir 'bwa_aligned/', mode: 'copy', overwrite: false, pattern: "*"
input:
tuple val(sampleId), file(reads)
output:
path "*"
script:
"""
bwa mem /exports/eddie/scratch/pdewari/hamour/genome/fEpiCoi_cnag1_curated_primary.no_mt.fa $reads -t 2 > ${sampleId}.sam
"""
}
workflow {
Channel.fromFilePairs("$baseDir/trimmed/*_{1,2}_val_{1,2}.fq.gz", checkIfExists:true) | BWA
}
The nextflow documentation shows lots of good examples and is very thorough, and I highly recommend checking that out whenever you get stuck and for best practices.
The workflow invocation is how you can send files from one process to another in DSL2 nextflow. But you need to remove the workflow invocations from script 1 and 2, and then create a workflow script (you can also put everything into one script, but I find it's easier to modify and add if its modular).
Script 1 becomes this:
I've made some changes following the documentation and to make it easier to follow. Firstly, the
file
qualifier is deprecated andpath
is preferred. I've also added thetag
so you can see which process is running on each sample. And finally, the output is a tuple, that mimics the input structure. Also, using*
wildcard is dangerous as you're creating an output channel with each and every file in the working directory, not just the ones you want.Script 2 becomes:
It's also not best practice to give nextflow absolute paths. The
fEpiCoi_cnag1_curated_primary.no_mt.fa
file should be staged in each processing environment as a value channel.And then finally, the workflow script.
NB. None of these scripts have been tested.