problems executing tajD test using pegas in R

144 views Asked by At

I am trying to calculate tajD from SNP data (~4000SNPs). I have both fasta and vcf files, and initially tried using my fasta file. With help, I got to the point that i have a script that works to sample from the file, but I needed to split the fasta file into separate populations. I got scared to do this (though I have have to given the error that I am getting now), so I phased my vcf file, and am hoping to using this instead. I'm using the R package pegas, and am getting the following error.

> rm(list=ls(all=T))
> library("ape")
> library("ade4")
> library("adegenet")
> library("pegas")

> b8c18FromVcf <- read.vcf("b8c18_2phased.vcf")

File apparently not yet accessed: Scanning file b8c18_2phased.vcf 3.194102 / 3.194102 Mb Done. Reading 4074 / 4074 loci. Done.

getting haplotypes

> b8c18haplos <- haplotype(b8c18FromVcf)

Analysing individual no. 186 / 186

tajD from haplos

> tajd <- tajima.test(b8c18haplos)

Warning message: In tajima.test(b8c18haplos) : Tajima test requires at least 4 sequences

I will attach links to both my phased and unphased files here. https://drive.google.com/open?id=0B6qb8IlaQGFZTmQ1YXRVbnFSRzA https://drive.google.com/open?id=0B6qb8IlaQGFZQm9HZjZSUkE3NEU

Any thoughts?

Lastly, I am wondering if there is a way to subset populations within the tajD command. I have 7 populations in the same vcf file, and I should calculate tajD for each separately. If not, what is the best tool for subsetting vcfs. I have done loads of googling on this, but none seem to be straightforward.

With thanks,

1

There are 1 answers

0
Ella Bowles On

The developer of pegas responded to me to say that Tajima's D requires DNA sequences. It cannot be executed using a vcf file, even if it is phased.