I have downloaded the complete vcf file for human genome (GRCh38) from clinvar. I want to filter all the variants only for TP53 gene. For now, I have used the following Linux command. Could anyone please let me know how I can achieve this same results using bcftools?
zcat clinvar.vcf.gz | awk ' { if (substr($1,1,1) == "#" ) print $0 }' > only_tp53.vcf
zcat clinvar.vcf.gz | grep "GENEINFO=TP53:" >> only_tp53.vcf
I took the genomic coordinates for TP53 from NCBI/Gene database and tried the following command, but it's not retrieving all the variants; few are missing.
bcftools view -r 17:7668421-7687490 clinvar.vcf.gz > clinvar_tp53_variants.vcf
TP53 is "chr17:7668421-7687490" ( https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&knownGene=pack&position=chr17:7668421-7687490&hgFind.matches=ENST00000269305.9 )
so using bcftools: