How to read in weighted edgelist with igraph in Python (not in R)?

3.5k views Asked by At

What I aim to do is create a graph of the nodes in the first 2 columns, that have edge lengths that are proportional to the values in the 3rd column. My input data looks like:

E06.1644.1  A01.908.1   0.5
E06.1643.1  A01.908.1   0.02 
E06.1644.1  A01.2060.1  0.7

I am currently importing it like this:

g=Graph.Read_Ncol("igraph.test.txt",names=True,directed=False,weights=True)
igraph.plot(g, "igraph.pdf", layout="kamada_kawai")

When I print the names or the weights (which I intend them to be the edge lengths), they print out fine with:

print(g.vs["name"])
print(g.es["weight"])

However, the vertices are blank, and the lengths do not seem to be proportional to their values. Also, there are too many nodes (A01.908.1 is duplicated). What am I doing wrong? Thanks in advance....

1

There are 1 answers

3
Tamás On BEST ANSWER

The vertices are blank because igraph does not use the name attribute as vertex labels automatically. If you want to use the names as labels, you have two options:

  1. Copy the name vertex attribute to the label attribute: g.vs["label"] = g.vs["name"]

  2. Tell plot explicitly that you want it to use the names as labels: plot(g, "igraph.pdf", layout="kamada_kawai", vertex_label=g.vs["name"])

I guess the same applies to the weights; igraph does not use the weights automatically to determine the thickness of each edge. If you want to do this, rescale the weight vector to a meaningful thickless range (say, from 0.5 to 3) and then set the rescaled vector as the width edge attribute:

>>> g.es["width"] = rescale(g.es["weight"], out_range=(0.5, 3))

Alternatively, you can also use the edge_width keyword argument in the plot() call:

plot(g, ..., edge_width=rescale(g.es["weight"], out_range=(0.5, 3)))

See help(Graph.__plot__) for more details about the keyword arguments that you can pass to plot().

As for the duplicated node, I strongly suspect that there is a typo in your input file and the two names are not equivalent; one could have a space at the end for instance. Inspect g.vs["name"] carefully to see if this is the case.

Update: if you want the lengths of the edges to be proportional to the prescribed weights, I'm afraid that this cannot be done exactly in the general case - it is easy to come up with a graph where the prescribed lengths cannot be achieved in 2D space. There is a technique called multidimensional scaling (MDS) which could reconstruct the positions of the nodes from a distance matrix - but this requires that a distance is specified for each pair of nodes (i.e. also for disconnected pairs).

The Kamada-Kawai layout algorithm that you have used is able to take edge weights into account to some extent (it is likely to get stuck in local minima so you probably won't get an exact result), but it interprets the weights as similarities, not distances, therefore the larger the weight is, the closer the endpoints will be. However, you still have to tell igraph to use the weights when calculating the layout, like this:

>>> similarities = [some_transformation(weight) for weight in g.es["weight"]]
>>> layout = g.layout_kamada_kawai(weights=similarities)
>>> plot(g, layout=layout, ...)

where some_transformation() is a "reasonable" transformation from distance to similarity. This requires some trial-and-error; I usually use a transformation based on a sigmoid function that transforms the median distance to a similarity of 0.5, the (median + 2 sd) distance to 0.1 and the (median - 2 sd) distance to 0.9 (where sd is the standard deviation of the distance distribution) - but this is not guaranteed to work in all cases.