Combine a directory of GVCF files with gatk CombineGVCFs

1.4k views Asked by At

I've produced a set of about 400 of GVCF files with gatk HaplotypeCaller, with the -ERC GVCF option. I'd now like to combine them for downstream genotyping and variant recalibration. I believe I can combine with gatk CombineGVCFs.

gatk CombineGVCFs \
   -R reference.fasta \
   --variant sample1.g.vcf.gz \
   --variant sample2.g.vcf.gz \
   -O cohort.g.vcf.gz

But what I don't know, is how to input all my 400 GVCF files into CombineGVCFs. I've heard this can be done with the --arguments_file option, but I don't know how to build such a file?

Any help gratefully received!

1

There are 1 answers

0
Vincent On

First, you need to create a text file containing the all GVCFs you want to combine:

ls gvcfs/*.vcf >gvcfs.list

Then use CombineGVCFs:

gatk --java-options "-Xmx180G -XX:ParallelGCThreads=36" CombineGVCFs -R $ref --variant gvcfs.list --dbsnp $DBSNP -O combined_gvcf.vcf