I want to set the "edge.length" in a phylo object using a variable in a data.frame. The "node.label" "tip.label" in the phylo object corresponds to the rownames in the data.frame. How can edge.length be set using a variable in the data.frame while ensuring that the data is matched correctly? In the code below it is in step 3. I want the edge.length to be matched so that the node.label or tip.label matches row.name in the data.frame.
## R code:
## load ape
library(ape)
## 1. A phylo object:
library(data.tree)
A1 <- Node$new("A1")
B1 <- A1$AddChild("B1")
C1 <- B1$AddChild("C1")
D1 <- C1$AddChild("D1")
E1 <- C1$AddChild("E1")
F1 <- E1$AddChild("F1")
G1 <- E1$AddChild("G1")
H1 <- G1$AddChild("H1")
A1.phylo <- as.phylo.Node(A1)
## 2. A data.frame:
set.seed(1)
df <- as.data.frame(rnorm(7, 5, 3))
names(df) <- "length"
row.names(df) <- c("B1","C1","D1","E1","F1","G1","H1")
## 3. Ad the data to A1.phylo$edge.length
A1.phylo$edge.length <- df$length ## wrong!!!
The edges lengths, tip labels and node labels in the
"phylo"
objects are dealt with in the order they appear in the edge table. Therefore, you should always attribute the different elements while making sure they are in the right order before they get attributed. For example (sorry I couldn't reproduce your example):Here the edges are all the elements connecting a node (digits
>4
) to a tip (digits<5
). You can visualise them (and their numbering) usingplot
:So now if you have a dataframe like this:
You can attribute the rows correctly by using:
In this case the sorting is pretty easy since the edge names in
df
are numeric but the logic is, the first element intest_tree$edge.length
should be the length of the edge connecting node 5 to tip 1, etc...Again, as your example is not reproducible, it's hard to figure out what's wrong but I would say your
df$length
is not the correct length.