I want to find all nodes of a graph, which are in simple cycles of length up to some given maximum cycle length max_length
. Along with this I want to make a histogram of the number of cycles (per cycle length) that each node is contained in. I'm trying to figure out what the best way is to get this task done.
Related question:
For a given node, how to find small cycles of which that node is a member? is a related question which is more specific than this question here. They are not duplicates.
I'm using python and graph-tool for analysing the graph, which seems to be a good choice, because several tests indicate that graph-tool is among the world's top performance graph libraries, esp. for python. I prefer to use python for this. In particular, others were not fast enough (networkx) or had bad time complexity (edge deletion in igraph) and hence didn't scale for other tasks for what I need them.
Here are my ideas for approaches:
Idea 1:
Iterate over all cycles of the graph in the order of ascending cycle length and take note of the nodes therein.
Idea 1.1. (Following up Idea 1.):
There is the function graph_tool.topology.all_circuits
, but that doesn't seem to iterate over only simple cycles or not in the order that I want. I don't really know what it does exactly. I just know that I can't use it, because it has exploding time complexity with my graph. E.g. for the small Zachary's karate club with just 34 nodes, it finds 1462130 many cycles. I want to analyse graphs with about 2 million nodes and 4 million edges. I've tried to run graph_tool.topology.all_circuits
for such a graph, but this way I didn't even get to see many of the shorter cycles that I'm interested in. I estimate that the cycle search on graphs like the ones I want to analyse would run for several years using this approach. Illustration:
import graph_tool.collection
import graph_tool.topology
g = graph_tool.collection.data["karate"]
print("number of nodes in graph:", g.num_vertices())
print()
print(f"circ_idx\tcircuit")
circuit_iter = graph_tool.topology.all_circuits(g)
i = 0
for circuit in circuit_iter:
i += 1
if i % 152371 == 0:
print(f"{i}\t{circuit}")
print("total number of circuits:", i)
Output:
number of nodes in graph: 34
circ_idx circuit
152371 [ 0 3 1 13 2 28 31 25 24 27 33 29 23 32 8]
304742 [ 0 7 1 30 32 23 33 27 24 25 31 28 2 3 12]
457113 [ 0 8 30 1 19 33 18 32 29 23 25 24 31]
609484 [ 0 8 33 27 24 25 23 29 32 2 3]
761855 [ 0 13 1 7 3 2 32 31 25 23 33 19]
914226 [ 0 17 1 7 3 2 32 31 24 27 23 29 33 30 8]
1066597 [ 0 19 33 27 24 25 23 29 32 30 1 3]
1218968 [ 0 31 24 27 23 29 26 33 22 32 2 3 12]
1371339 [ 0 31 32 23 25 24 27 33 13 3 2 1 30 8]
total number of circuits: 1462130
Idea 1.2. (Following up Idea 1.):
Implement Idea 1. myself. Doable, but this sounds like quite some work. I don't want to invent the wheel another time. Is there a better way?
Idea 2:
As an alternative to Idea 1., it's also okay to iterate per node over the cycles that a given node is contained in, again in the order of ascending cycle lengths. This can be done with a breadth first search (BFS), starting from the node in question.
With this I can then check membership of a node in said cycles and then move on to the next node. This is less efficient, because I will process cycles repeatedly for every node that is contained in the same cycle, but it will still give reasonable overall time complexity.
The big advantage of this approach is that the implementation will be easier than Idea 1.
Are there better approaches? Does graph-tool offer something that is more well suited for this task?
Here is some C++ code that builds a histogram of the number of cycles each node in a graph belongs to.
This uses the depth first search cycle finder in the the PathFinder graph theory library. https://github.com/JamesBremner/PathFinder
Processing the Zachary test data set requires 4 milliseconds to find the 42 cycles under or equal length to 5, while building the histogram takes 2 microseconds.
Here is the main code:
The complete application code is at https://github.com/JamesBremner/so77686189
Here is the output: