Setting the edge.lenth in a phylo object using a variable in a data.frame

1k views Asked by At

I want to set the "edge.length" in a phylo object using a variable in a data.frame. The "node.label" "tip.label" in the phylo object corresponds to the rownames in the data.frame. How can edge.length be set using a variable in the data.frame while ensuring that the data is matched correctly? In the code below it is in step 3. I want the edge.length to be matched so that the node.label or tip.label matches row.name in the data.frame.

## R code:
## load ape
library(ape)
## 1. A phylo object:
library(data.tree)

A1  <- Node$new("A1")
B1  <- A1$AddChild("B1")
C1  <- B1$AddChild("C1")
D1  <- C1$AddChild("D1")
E1 <- C1$AddChild("E1")
F1 <- E1$AddChild("F1")
G1 <- E1$AddChild("G1")
H1 <- G1$AddChild("H1")
A1.phylo <- as.phylo.Node(A1)


## 2. A data.frame:
set.seed(1)
df <- as.data.frame(rnorm(7, 5, 3))
names(df) <- "length"
row.names(df) <- c("B1","C1","D1","E1","F1","G1","H1")

## 3. Ad the data to A1.phylo$edge.length
A1.phylo$edge.length <- df$length ## wrong!!!
1

There are 1 answers

2
Thomas Guillerme On

The edges lengths, tip labels and node labels in the "phylo" objects are dealt with in the order they appear in the edge table. Therefore, you should always attribute the different elements while making sure they are in the right order before they get attributed. For example (sorry I couldn't reproduce your example):

set.seed(1)
## A random tree with 6 edges
test_tree <- rtree(4)

## The edge table
test_tree$edge
#     [,1] [,2]
#[1,]    5    1
#[2,]    5    6
#[3,]    6    2
#[4,]    6    7
#[5,]    7    3
#[6,]    7    4

Here the edges are all the elements connecting a node (digits >4) to a tip (digits <5). You can visualise them (and their numbering) using plot:

## Visualising all the elements
plot(test_tree, show.tip.label = FALSE)
edgelabels()
nodelabels()
tiplabels()

So now if you have a dataframe like this:

## A random data frame
df <- as.data.frame(rnorm(6))
names(df) <- "length"
## The edges in the "wrong" order
row.names(df) <- sample(1:6)

You can attribute the rows correctly by using:

## Get the order of the edges
test_tree$edge.length <- df$length[sort(rownames(df))]

In this case the sorting is pretty easy since the edge names in df are numeric but the logic is, the first element in test_tree$edge.length should be the length of the edge connecting node 5 to tip 1, etc...

Again, as your example is not reproducible, it's hard to figure out what's wrong but I would say your df$length is not the correct length.