How can I retrieve all variants only for a specific gene from VCF file using BCFTools?

168 views Asked by At

I have downloaded the complete vcf file for human genome (GRCh38) from clinvar. I want to filter all the variants only for TP53 gene. For now, I have used the following Linux command. Could anyone please let me know how I can achieve this same results using bcftools?

zcat clinvar.vcf.gz | awk ' { if (substr($1,1,1) == "#" ) print $0 }' > only_tp53.vcf
zcat clinvar.vcf.gz | grep "GENEINFO=TP53:" >> only_tp53.vcf

I took the genomic coordinates for TP53 from NCBI/Gene database and tried the following command, but it's not retrieving all the variants; few are missing.

bcftools view -r 17:7668421-7687490 clinvar.vcf.gz > clinvar_tp53_variants.vcf
1

There are 1 answers

0
Pierre On

TP53 is "chr17:7668421-7687490" ( https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&knownGene=pack&position=chr17:7668421-7687490&hgFind.matches=ENST00000269305.9 )

so using bcftools:

bcftools index clinvar.vcf.gz
bcftools view clinvar.vcf.gz "chr17:7668421-7687490"