I have a dataframe with loci names in one column and DNA sequences in the other. I'm trying to use as.DNAbin{ape}
or similar to create a DNAbin object.
Here some example data:
x <- structure(c("55548", "43297", "35309", "34468", "AATTCAATGCTCGGGAAGCAAGGAAAGCTGGGGACCAACTTCTCTTGGAGACATGAGCTTAGTGCAGTTAGATCGGAAGAGCA", "AATTCCTAAAACACCAATCAAGTTGGTGTTGCTAATTTCAACACCAACTTGTTGATCTTCACGTTCACAACCGTCTTCACGTT", "AATTCACCACCACCACTAGCATACCATCCACCTCCATCACCACCACCGGTTAAGATCGGAAGAGCACACTCTGAACTCCAGTC", "AATTCTATTGGTCATCACAATGGTGGTCCGTGGCTCACGTGCGTTCCTTGTGCAGGTCAACAGGTCAAGTTAAGATCGGAAGA"), .Dim = c(4L, 2L))
If I try y <- as.DNA(x)
R creates a sort of DNAbin object with 4 DNA sequences (the 4 rows of the example) of length 2 (the two columns, I assume), there is no labels and of course the base composition doesn't work either.
The documentation is not very clear, but after playing with the woodmouse example data of the package I think that what I need to do is to create a matrix with each base as a column and then use as.DNAbin
. I.e. in the above example a 4 x 84 matrix (1 column for locus name and 83 for the sequences?). Any advice on how to do this? Or any better idea?
Thanks
First parameter of
as.DNAbin
should be a matrix or a list containing the DNA sequences, or an object of class "alignment". So, your idea is right.Given
x
is the structure from original post, the code below prepares matrixy
:Then
as.DNAbin(y)
shows: