I'm trying to create an array in python that will contain all the pairwise distances between every pair of nodes on a phylogenetic tree. I'm currently using dendropy to do this. (I initially looked at biopython but couldn't find an option to do this). The code I have so far looks like this:

import dendropy

tree_data = []
tree = dendropy.Tree.get(path="gonno_microreact_tree.nwk",schema="newick")
pdc = tree.phylogenetic_distance_matrix()
for i, t1 in enumerate(tree.taxon_namespace[:-1]):
    for t2 in tree.taxon_namespace[i+1:]:
        tip_pair = {}
        tip_dist_list = []
        tip_pair[t1] = t2
        distance = pdc(t1, t2)
        tip_dist_list.append(tip_pair)
        tip_dist_list.append(distance)
        tree_data.append(tip_dist_list)
print tree_data

This works well except for the way it writes the tip labels. For example an entry in the tree_data list looks like this:

[{<Taxon 0x7fc4c160b090 'ERS135651'>: <Taxon 0x7fc4c160b150 'ERS135335'>}, 0.0001294946558138355]

But the tips in the newick file are just labelled ERS135651 and ERS135335 respectively. How can I get dendropy to write the array with just the original tip labels so this entry would look like this:

 [{ERS135651:ERS135335}, 0.0001294946558138355]

(Also I read the dendropy documentation and I'm aware that it says to use treecalc to do this, like this:

pdc = treecalc.PatristicDistanceMatrix(tree)

But I just get an error saying the command does not exist:

AttributeError: 'module' object has no attribute 'PairisticDistanceMatrix'

)

Any suggestions for how I can get this working?

1

There are 1 answers

0
Ben Jeffrey On BEST ANSWER

Converting the tip labels to a string converted them to the name surrounded by speech marks, e.g.:

t1 = str(t1)
print t1

Gives:

"'ERS135651'"

So using string splicing to remove the extra speech marks works to convert the tip label back to it's proper name, e.g.:

t1 = t1.replace("'","")