Neighbor edges sorted based on edge weights in networkx (Python)

5.7k views Asked by At

I have build a graph based on networkx in which edges represent distance between them.

  GraphLoc.G.add_edge(fnode_id, snode_id, score=score)

score is the edge weight.

I am not able to find API which can provide neighboring nodes which has edge and results are in sorted order of weight.

Obviously i can also sort on my own , but i don't want run-time computation. So does networkx provide any solution for this

3

There are 3 answers

1
Tim McNamara On

Selecting a single node will return its neighbours. You can then sort the edge list yourself fairly easily. First, we set up the graph.

>>> G = nx.Graph()
>>> G.add_edge('a', 'b', score=3)
>>> G.add_edge('b', 'c', score=4)
>>> G.add_edge('a', 'c', score=1)

If we wanted a's neighbors, we just access that node directly:

>>> G['a']
{'b': {'score': 3}, 'c': {'score': 1}}

To sort those results, we use tools from the standard Python toolbox. .items() to convert the dict to a tuple and the sorted builtin to sort the results:

>>> sorted(G['a'].items(), key=lambda edge: edge[1]['score'])
[('c', {'score': 1}), ('b', {'score': 3})]

If you need to make the relationship between the results and your original node explicit, it's easy to include this in the result with a list comprehension:

>>> neighbors = sorted(G['a'].items(), key=lambda edge: edge[1]['score'])
>>> [('a', node) for node, _metadata in neighbors]
[('a', 'c'), ('a', 'b')] 
0
jme On

First you should do some tests and make certain that sorting a node's neighbors by score is a bottleneck in your code. In many graphs arising from typical sources of data, a node is not likely to have many neighbors. In this case, sorting the neighbors by score is very cheap, and isn't worth precomputing.

If you've decided for one reason or another that this is not sufficient, and that you must presort the nodes, I think your best bet is to subclass networkx.Graph and supply your own mapping type for its internal representation of a graph. Note that this is only documented in the development version of networkx, so this functionality isn't guaranteed to be available or stable quite yet.

There are two approaches, the second being more general but more complicated. In the first case, I'll assume that you have control over the order in which you add edges to the graph, and that you can add edges all at once. In the second case I'll make no such assumption, but the code will be more complicated.

Case 1: Edges are inserted in sorted order

In this case, I assume that you can insert edges in order of increasing score. All that needs to be done is to supply your subclass with a factory for the adjacency mapping which preserves the order of edges. collections.OrderedDict will do:

import networkx as nx
import collections

class OrderedGraph(nx.Graph):
    adjlist_dict_factory = collections.OrderedDict

g = OrderedGraph()
g.add_edge(1, 3, score=17)
g.add_edge(1, 2, score=42)
g.add_edge(1, 4, score=55)

Notice that the edges were added in increasing order, according to their score! Now if we write:

>>> g.neighbors(1)
[3, 2, 4]

as desired. Note, though, that if we add an edge at a later time, it must have a larger score than any of its neighboring edges for the sorted score invariant to remain unbroken. That is, you can't run the above code and afterwards do this:

>>> g.add_edge(1, 5, score=1)

and expect it to work. You'll get:

>>> g.neighbors(1)
[3, 2, 4, 5]

where [5, 3, 2, 4] was desired. If you cannot add edges all at one time, then, and cannot guarantee that they are only inserted in sorted order, you'll need a more general implementation, such as that in the next case.

Case 2: Edges are inserted in arbitrary order

In this case we need a class that acts like a mapping, but that keeps track of the inserted nodes in increasing order of their score. To do this, we'll use a heap together with a dictionary in a class that inherits from collections.abc.MutableMapping. The heap will keep the nodes in order of their score at the cost of extra memory.

The below implementation is very rough, so use it with caution:

import heapq

class SortedScoreMap(collections.abc.MutableMapping):

    def __init__(self):
        self._data = {}
        self._heap = []

    def __getitem__(self, key):
        return self._data[key]

    def __setitem__(self, key, value):
        if key in self._data:
            self._heap_remove_key(key)
        self._data[key] = value
        heapq.heappush(self._heap, (value['score'], key))

    def __delitem__(self, key):
        del self._data[key]
        self._heap_remove_key(key)

    def __iter__(self):
        yield from (key for (score, key) in self._heap)

    def __len__(self):
        return len(self._data)

    def _heap_remove_key(self, key_to_remove):
        entries = []
        for score, key in self._heap:
            if key == key_to_remove:
                entries.append((score, key))

        for entry in entries:
            self._heap.remove(entry)

SortedScoreMap expects its values to be dictionaries with a score key. This isn't enforced by the above code in the interest of space. Here's a demo:

>>> sm = SortedScoreMap()
>>> sm[5] = {'score': 17}
>>> sm[7] = {'score': 2}
>>> sm[1] = {'score': 42}
>>> list(sm.keys())
[7, 5, 1]
>>> sm[7] = 99
[5, 1, 7]

As you can see, this keeps the keys in order of score. Now we use this as the adjacency mapping factory:

import networkx as nx
import collections

class OrderedGraph(nx.Graph):
    adjlist_dict_factory = SortedScoreMap

g = OrderedGraph()
g.add_edge(1, 3, score=17)
g.add_edge(1, 2, score=42)
g.add_edge(1, 4, score=55)

Now if we write:

>>> g.neighbors(1)
[3, 2, 4]

as expected, and we can even do:

>>> g.add_edge(1, 5, score=1)
>>> g.neighbors(1)
[5, 3, 4, 2]

and it works!

Now, this comes at the cost of extra memory (each key is effectively duplicated) and computation, as an extra layer of indirection is needed. Depending on the size of your problem, you might find that sorting the neighbors every time you need them is actually faster than this approach. So as I said at the beginning: profile your code, find out what the actual bottleneck is, and only then implement an improvement.

0
dschult On

The reporting order of G.edges is NOT the same as the order they were added to the graph. To sort the edges, manipulate the edges using G.edges

The edges are reported with G.edges, and you can ask for edges involving a node n using G.edges(n) and that has an optional keyword argument to get the weight: G.edges(n, data="weight") as a container of edge tuples: (n, nbr, wt). Then you can sort the result based on the last entry in the edge tuple:

sorted(G.edges(n, data="weight"), key=lambda e:e[-1])
# or to sort all the edges in G
sorted(G.edges(data="weight"), key=lambda e:e[-1])