How do I loop through files and combine them based on a species designation? - Bash

32 views Asked by At

I'm trying to merge fastq.gz files together based on species, and I'm trying to figure out how to do that without explicitly naming the species I'm using so that I can use the same bash script for different groups of species later. I am relatively unfamiliar with bash, so this may be a more basic issue.

The file names look like this:

GSF3164-Moyle-107-6_L_S75_R1_001.fastq.gz
GSF3164-Moyle-107-6_L_S75_R2_001.fastq.gz
GSF3164-Moyle-107-7_F_S48_R1_001.fastq.gz
GSF3164-Moyle-107-7_F_S48_R2_001.fastq.gz
GSF3164-Moyle-107-7_L_S76_R1_001.fastq.gz
GSF3164-Moyle-107-7_L_S76_R2_001.fastq.gz
GSF3164-Moyle-1322-10_F_S44_R1_001.fastq.gz
GSF3164-Moyle-1322-10_F_S44_R2_001.fastq.gz
GSF3164-Moyle-1322-10_L_S96_R1_001.fastq.gz
GSF3164-Moyle-1322-10_L_S96_R2_001.fastq.gz
GSF3164-Moyle-1322-1_F_S42_R1_001.fastq.gz
GSF3164-Moyle-1322-1_F_S42_R2_001.fastq.gz

The species designations in these files are 107 and 1322. What loop would work for automatically combining files with these names?

I was generally thinking that it should look something like this:

for SPECIES in GSF3164-Moyle-SPECIES*
do
    cat GSF3164-Moyle-SPECIES* > otherFolder/SPECIES.fastq.gz
done

I don't know what I should be putting in the for loop and how to designate each species.

Thank you for your time.

1

There are 1 answers

4
markp-fuso On BEST ANSWER

Making some minor changes to your current code:

for fname in GSF3164-Moyle-*
do
    IFS='-' read -r _ _ specie _ <<< "${fname}"             # split fname on "-" delimiter; we're only interested in the 3rd 'field' (ie, the numeric specie)
    cat "${fname}" >> otherFolder/"${specie}".fastq.gz      # append to single file for given specie
done