Graph partition algo with Neo4j graph database

2.3k views Asked by At

I know there has some famous graph partition algo tools like METIS which is implemented by karypis Lab (http://glaros.dtc.umn.edu/gkhome/metis/metis/overview)

but I wanna know is there any method to partition graph stored in Neo4j? or I have to dump the Neo4j's data and transform the node and edge format manually to fit the METIS input format?

2

There are 2 answers

2
boggle On

Having worked independently with METIS and Neo4j in the past, I am not aware of any tool for generating a METIS file from Neo4j. That being said, writing such a tool should be an easy task and would be a great community contribution.

Another approach for integrating METIS with Neo4j might be in connecting METIS to Neo4j from C++ via JNI. However this is going to be much more involved as it would have to take care of things like transactions, concurrency etc.

On the more general question of partitioning graphs, it is quite possible to implement some of the more known and simple algorithms with reasonable effort.

1
Alex Averbuch On

Regarding new-ish and interesting algorithms, this is by no means exhaustive or state of the art, but these are the first places I would look:

Specific Algorithm: DiDiC (Distributed Diffusive Clustering) - I used it once in my thesis (Partitioning Graph Databases)

  • You iterate over all nodes, then for each node retrieve all neighbors, in order to spread some of "some unit" to all your neighbors
  • Easy to implement.
  • Can be made deterministic
  • Iterative - as it's based on iterations (like Super Steps in Pregel) you can stop it at any time. The longer you leave it the better the result, in theory (though in some cases, on certain graph shapes it can be unstable)
  • When we implemented this we ran it for 100 iterations on a machine with ~30GB RAM, for up to ~4 million nodes - it took no more than two days to complete.

Specific Algorithm: EvoCut "Finding sparse cuts locally using evolving sets" - local probabilistic algorithm from Microsoft - related to these papers

  • Difficult to implement
  • Local algorithm - BFS-like access patterns (random walks)
  • It's been a while since i read that paper, but i remember it was built on clean abstractions:
    • EvoNibble (pluggable - decides how much of neighborhood to add to the current cluster
    • EvoCut (calls EvoNibble multiple times to find the local cluster)
    • EvoPartition (calls EvoCut repeatedly to partition entire graph)
  • Not deterministic

General Algorithm Family: Hierarchical Graph Clustering

From a high level:

  • Coarsen the graph by collapsing nodes into aggregate nodes
    • coarsening strategy is selectable
  • Find clusters in the coarsened/smaller graph
    • clustering strategy is selectable
  • Incrementally decoarsen the graph, refining at the clustering at each step
    • refining strategy is selectable

Notes:

  • If the graph changes slowly (or results don't need to be right up to date) it may be possible to coarsen once (or infrequently) then work with the coarsened graph - to save computation
  • I don't know of a specific algorithm to recommend

General limitations - the things few clustering algorithms do:

  • Node types not acknowledged - i.e., all nodes treated equally
  • Relationship types not acknowledged - i.e., all relationships treated equally
  • Relationship direction not acknowledged - i.e., relationships treated as undirected