vcf-consensus script error: The sequence N not found in the fasta file

678 views Asked by At

I am trying to use this script (vcf-consensus) with a simple example but I have one error: The sequence "7" not found in the fasta file.

The syntaxis is:

Usage: cat ref.fa | vcf-consensus [OPTIONS] in.vcf.gz > out.fa

My FASTA file is:

TGGCTGGAACGGGACCTCACATTCTGTATTTGTCCCGATTGGCTAGCAACTTAGAACTTT

And my VCF file is:

##fileformat=VCFv4.1
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SAMPLE
7   1   .   T   A   .   .   .   GT  0/1
7   2   .   G   A   .   .   .   GT  0/1
7   3   .   G   A   .   .   .   GT  0/1
7   4   .   C   A   .   .   .   GT  0/1

I compress by bgzip and index by tabix the VCF file:

bgzip vcfFile.vcf
tabix -p vcfFile.vcf.gz

And then, I execute:

cat fastaFile.fa | vcf-consensus vcfFile.vcf.gz > out.fa

I get this error: The sequence "7" not found in the fasta file.

Does anyone know?

Thanks.

1

There are 1 answers

0
Pierre On BEST ANSWER

your VCF only contain the chromosome '7' in column 1.

but your fasta header is

>gi|157696558|ref|NW_001838997.1| Homo sapiens chromosome 7 genomic scaffold, alternate assembly HuRef SCAF_1103279187418, whole genome shotgun sequence

tabix would work if your fasta header was just:

>7