I've produced a set of about 400 of GVCF files with gatk HaplotypeCaller, with the -ERC GVCF
option. I'd now like to combine them for downstream genotyping and variant recalibration. I believe I can combine with gatk CombineGVCFs.
gatk CombineGVCFs \
-R reference.fasta \
--variant sample1.g.vcf.gz \
--variant sample2.g.vcf.gz \
-O cohort.g.vcf.gz
But what I don't know, is how to input all my 400 GVCF files into CombineGVCFs. I've heard this can be done with the --arguments_file
option, but I don't know how to build such a file?
Any help gratefully received!
First, you need to create a text file containing the all GVCFs you want to combine:
Then use
CombineGVCFs
: