Trying to write data into newick format R

1k views Asked by At

I have a dataset with different levels of branches starting from the top level: stock -> mbranch -> sbranch -> lsbranch. I want to be able to visualize these levels of my data into Newick format. I have different language groups within each stock level and would like to make different trees based off of these highest level groups.

For example my data is in the format as follows:

sample= data.frame("stock" = c("A", "A", "B", "B", "B"), "mbranch" = c("C", "D", "E", "F", "G"), "sbranch" = c("H", "O", NA, "K", "L"), "lsbranch" = c("I", "J", NA, "M", "N"), "name" = c("Andrea", "Kevin", "Charlie", "Naomi", "Sam"))

And I am trying to have an output of the newick tree format which would be something like:

tree = "(A(C(H(I(Andrew))),D(O(J(Kevin)))),B(E(Charlie),F(K(M(Naomi))),G(L(N(Sam)))));"
plot(read.dendrogram(tree))

I'm doing this so later on I can do a distance matrix of the nodes of my outputted tree.

Would the function write.tree be able to analyze data like this and make a tree from this (assuming my actual dataset is much larger)? Or in general, a function that would output the tree format. Thanks

1

There are 1 answers

0
Thomas Guillerme On BEST ANSWER

You can use the ape::read.tree() function to read your newick format tree

tree = "(A(C(H(I(Andrew))),D(O(J(Kevin)))),B(E(Charlie),F(K(M(Naomi))),G(L(N(Sam)))));"
my_tree <- read.tree(text = tree)
plot(my_tree)

You can then use ape::write.tree to save the tree into a newick file:

write.tree(my_tree, file = "my_file_name.tre")

To convert your table into a "phylo" object from ape you can use this function (that might need some adjustements):

## The function
data.frame.to.phylo <- function(sample){
    ## Making an edge table
    edge_table <- rbind(
        ## The root connecting A to B
        rbind(c("root", "A"),c("root", "B")),
        ## All the nodes connecting to the tips
        cbind(sample$stock, sample$name)
        )

    ## Translating the values in the edge table into edge IDs
    ## The order must be tips, root, nodes
    element_names <- c(unique(sample$name), "root", unique(sample$stock))
    element_ids   <- seq(1:length(element_names))

    ## Looping through each ID and name
    for(element in element_ids) {
        edge_table <- ifelse(edge_table == element_names[element], element_ids[element], edge_table)
    }

    ## Make numeric
    edge_table <- apply(edge_table, 2, as.numeric)

    ## Build the phylo object
    phylo_object <- list()
    phylo_object$edge <- edge_table
    phylo_object$tip.label <- unique(sample$name)
    phylo_object$node.label <- c("root", unique(sample$stock))
    phylo_object$Nnode <- length(phylo_object$node.label)

    ## Forcing the class to be "phylo"
    class(phylo_object) <- "phylo"
    return(phylo_object)
}

## The data
sample = data.frame("stock" = c("A", "A", "B", "B", "B"), "mbranch" = c("C", "D", "E", "F", "G"), "sbranch" = c("H", "O", NA, "K", "L"), "lsbranch" = c("I", "J", NA, "M", "N"), "name" = c("Andrea", "Kevin", "Charlie", "Naomi", "Sam"))

## Plotting the data.frame for testing the function
plot(data.frame.to.phylo(sample))

Cheers, Thomas