What clustering algorithms can I consider for graph?

152 views Asked by At

Suppose I have an undirected weighted connected graph. I want to group vertices that have highest edges' value all together (vertices degree). Using clustering algorithms is one way. What clustering algorithms can I consider for this task? I hope it is clear; any question for clarification, please ask. Thanks.

1

There are 1 answers

0
AvidLearner On

There are two main approach - giving your graph as an input to existing tool, or using expert knowledge you have on this graph (and its domain) in order to create a representation, and then apply machine learning methods on it.

I'll start with the second approach:

If you have only the nodes and edges (no farther data for each node), you first need to think of a representation for each node\edge. I going to explain about nodes, but it should should be similar for edges' case.

The simplest approach is to represent each node n as a connectivity vector:

enter image description here

Every node will be represented as n=(Ia(n),Ib(n),Ic(n),Id(n),Ie(n)), where Ii(n)=1 in case node n is a 'friend' (neighbor) of node i, and 0 otherwise. (e.g.a=(0,1,1,0,1))

Note that you can decide if a node is a friend of itself.

Second approach, which is quite similar to the first one, is to use edges' weights vector:

n=(W(a,n),W(b,n),W(c,n),W(d,n),W(e,n)) , where W(i,n) is the weight of the edge (i,n).

There are a few more ways to represent nodes, but this is enough in order to run some calculations on it.

After you have this presentation, you can start applying some clustering algorithms on it.

kmeans is considered great for this task, and sklearn has a great implementation. It has some parameters you can (and should) configure (i.e. the distance measure).

The product of kmeans, is k different non-intersecting groups of nodes.

If you want to pass you graph to an algorithm and get some measures, there are more advanced algorithms you can apply. community detection is used to find communities in a graph. Again, there is a nice python implementation in the networkxpackage.