Nextflow error: Missing output files in a process

145 views Asked by At

I'm experiencing another strange issue with my script and was wondering if I could get some more help. Hi there, I am trying to do genome assembly using several tools and the below script:

  /*
 * Define the default parameters
 */ 

params.podsF = '/home/staukobong/pod5/'
pods_ch = Channel.fromPath(params.podsF, checkIfExists: true)


/*
 * Basecalling PORE5 files using Dorado
 */


process BASECALL {

    module 'dorado'
    debug true

    input:
    path sample_id

    output:
    path 'sample_id.bam' , emit: bamfiles_complete

    script:
    """
    dorado basecaller /home/staukobong/[email protected] $sample_id > sample_id.bam
    """
}


/*
 * Convert fastq files to bam files and concatenate the files
 */


process CONVERT {

    debug true

    input:
    path sample_id

    output:
    path 'sample_id.fastq', emit: fastq_files

    script:
    """
    samtools bam2fq $sample_id > sample_id.fastq
    """
}


/*
 * Check quality of sequencing reads using FASTQC
 */

process FASTQC1 {

    module 'FastQC'
    debug true

    input:
    path sample_id

    output:
    path 'sample_id_fastqc.html', emit: fastqc_files

    script:
    """
    fastqc $sample_id -t 4
    """

}


/*
 * Trim fastq files after base calling using Nanofilt
 */

process TRIM {

    module 'nanofilt'
    debug true

    input:
    path sample_id

    output:
    path 'sample_id.trimmed.fastq', emit: trimmed_fastq

    script:
    """
    NanoFilt -l 200 -q 20 --headcrop 50 --tailcrop 6000 $sample_id > sample_id.trimmed.fastq
    """
}


/*
 * Check quality of sequencing reads using FASTQC
 */

process FASTQC2 {

    module 'FastQC'
    debug true

    input:
    path sample_id2

    output:
    path 'sample_id2_fastqc.html', emit: fastqc_files2

    script:
    """
    fastqc $sample_id2 -t 4
    """

}


/*
 * Assemble the reads using FLYE
 */

process ASSEMBLY {

    module 'flye/2.9'
    debug true

    input:
    path sample_id

    output:
    path '*', emit: Assembly_files

    script:
    """
    flye --nano-raw $sample_id -i 3 -t 4
    """

}

/*
 * Mapping the reads using minimap2
 */

process MAPPINGS {

    module 'minimap2'
    debug true

    input:
    path sample_id

    output:
    path 'sample_id.sam', emit: Mapped_files

    script:
    """
    minimap2 -a -t 4 ${sample_id}.trimmed.fastq ${sample_id}.fasta > sample_id.sam
    """

}


/*
========================================================================================
                                Create default workflow
========================================================================================
*/

workflow {
    BASECALL(pods_ch)
    CONVERT(BASECALL.out.bamfiles_complete)
    FASTQC1(CONVERT.out.fastq_files)
    TRIM(CONVERT.out.fastq_files)
    FASTQC2(TRIM.out.trimmed_fastq)
    ASSEMBLY(TRIM.out.trimmed_fastq)
    MAPPINGS(TRIM.out.trimmed_fastq.combine(ASSEMBLY.out.Assembly_files))


}

The script works however once it gets to 5th process (FASTQC2), it gives me the following error even though the output file is generated but also the script stops running:

    `ERROR ~ Error executing process > 'FASTQC2 (1)'

Caused by:
Missing output file(s) sample_id2_fastqc.html expected by process FASTQC2 (1)

Command executed:

fastqc sample_id.trimmed.fastq -t 4

Command exit status:
0

Command output:
Analysis complete for sample_id.trimmed.fastq

Command error:
Started analysis of sample_id.trimmed.fastq
Approx 5% complete for sample_id.trimmed.fastq
Approx 10% complete for sample_id.trimmed.fastq
Approx 20% complete for sample_id.trimmed.fastq
Approx 25% complete for sample_id.trimmed.fastq
Approx 35% complete for sample_id.trimmed.fastq
Approx 40% complete for sample_id.trimmed.fastq
Approx 50% complete for sample_id.trimmed.fastq
Approx 55% complete for sample_id.trimmed.fastq
Approx 65% complete for sample_id.trimmed.fastq
Approx 70% complete for sample_id.trimmed.fastq
Approx 80% complete for sample_id.trimmed.fastq
Approx 85% complete for sample_id.trimmed.fastq
Approx 95% complete for sample_id.trimmed.fastq
Work dir:
/home/staukobong/work/89/9ecd8124e69bdb6ad772d423711c87

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

-- Check '.nextflow.log' file for details`

Not sure what the problem might be because the first fastqc and the other processes work. Could I please get some assistance on this, would greatly appreciate it. Thank you.

1

There are 1 answers

1
Pallie On

Fastqc will strip off the .fastq extension and its output files will take that as a basename. So when you feed it sample_id.trimmed.fastq, the basename will be sample_id.trimmed and your html report will be sent to sample_id.trimmed_fastqc.html.

You've told your fastqc2 process to expect a file called sample_id2_fastqc.html to be generated:

Caused by: Missing output file(s) sample_id2_fastqc.html expected by process FASTQC2 (1)

Change the output: of the FASTQC2 process to exactly match the filename fastqc produces.