how to chain Channel.fromFilePairs from one process to another process

164 views Asked by At

I have two nextflow scripts that work independently, how do I combine them into one such that output files from process TRIM can be used as input files for process BWA. The process BWA must not start until paired trimmed files have been created.

Both scripts use paired files (for each mate of paired end sequencing), so not sure how to use Channel.fromFilePairs for chaining the two scripts into one.

nextflow script 1 this script trims reads and stores the output in trimmed/

#!/usr/bin/env nextflow
nextflow.enable.dsl=2

process TRIM {
  debug true

  publishDir 'trimmed/', mode: 'copy', overwrite: false, pattern: "*"

  input:
  tuple val(sampleId), file(reads)
  
  output:
  path "*"

  script:
  """
  trim_galore --paired -q 30 --length 30 --fastqc --phred33 $reads
  """
}

workflow {
  Channel.fromFilePairs("$baseDir/subset_fq/*_{1,2}.fq.gz", checkIfExists:true) \
    | TRIM
}

nextflow script 2 this script uses trimmed reads from trimmed/ and outputs sam files in bwa_aligned/

#!/usr/bin/env nextflow
nextflow.enable.dsl=2

process BWA {
  debug true

  publishDir 'bwa_aligned/', mode: 'copy', overwrite: false, pattern: "*"

  input:
  tuple val(sampleId), file(reads)

  output:
  path "*"

  script:
  """
  bwa mem /exports/eddie/scratch/pdewari/hamour/genome/fEpiCoi_cnag1_curated_primary.no_mt.fa $reads -t 2 > ${sampleId}.sam
  """
}

workflow {
  Channel.fromFilePairs("$baseDir/trimmed/*_{1,2}_val_{1,2}.fq.gz", checkIfExists:true) | BWA
}

1

There are 1 answers

7
dthorbur On BEST ANSWER

The nextflow documentation shows lots of good examples and is very thorough, and I highly recommend checking that out whenever you get stuck and for best practices.

The workflow invocation is how you can send files from one process to another in DSL2 nextflow. But you need to remove the workflow invocations from script 1 and 2, and then create a workflow script (you can also put everything into one script, but I find it's easier to modify and add if its modular).

Script 1 becomes this:

#!/usr/bin/env nextflow

process TRIM {
  debug true
  tag "$sampleId"

  publishDir 'trimmed/', mode: 'copy', overwrite: false

  input:
  tuple val(sampleId), path(reads)
  
  output:
  tuple val(sampleId), path(*_val_1.fq.gz), path(*_val_2.fq.gz), emit: trimmed_seqs
  // you can add an output channel for fastqc here, as it doesn't need to be in the same channel as the reads. 

  script:
  """
  trim_galore --paired -q 30 --length 30 --fastqc --phred33 $reads
  """
}

I've made some changes following the documentation and to make it easier to follow. Firstly, the file qualifier is deprecated and path is preferred. I've also added the tag so you can see which process is running on each sample. And finally, the output is a tuple, that mimics the input structure. Also, using * wildcard is dangerous as you're creating an output channel with each and every file in the working directory, not just the ones you want.

Script 2 becomes:

#!/usr/bin/env nextflow

process BWA {
  tag "$sampleId"
  debug true

  publishDir 'bwa_aligned/', mode: 'copy', overwrite: false, pattern: "*"

  input:
  // your original input declaration didn't match the cardinality of your TRIM process output. 
  tuple val(sampleId), path(r1), path(r2)

  output:
  path "*" // I would update this to be less promiscuous

  script:
  """
  bwa mem /exports/eddie/scratch/pdewari/hamour/genome/fEpiCoi_cnag1_curated_primary.no_mt.fa $reads -t 2 > ${sampleId}.sam
  """
}

It's also not best practice to give nextflow absolute paths. The fEpiCoi_cnag1_curated_primary.no_mt.fa file should be staged in each processing environment as a value channel.

And then finally, the workflow script.

#!/usr/bin/env nextflow
nextflow.enable.dsl=2

// These paths are relative to the workflow .nf file
include { TRIM } from './modules/trim.nf'
include { BWA } from './modules/bwa.nf'

Channel
  .fromFilePairs("$baseDir/subset_fq/*_{1,2}.fq.gz", checkIfExists:true)
  .set{ paired_reads }

workflow Mapping {
  main:
    TRIM(paired_reads)
    BWA(TRIM.out.trimmed_seqs)
}

NB. None of these scripts have been tested.