How to iterate simple cycles in ascending cycle length order?

160 views Asked by At

I want to find all nodes of a graph, which are in simple cycles of length up to some given maximum cycle length max_length. Along with this I want to make a histogram of the number of cycles (per cycle length) that each node is contained in. I'm trying to figure out what the best way is to get this task done.

Related question:
For a given node, how to find small cycles of which that node is a member? is a related question which is more specific than this question here. They are not duplicates.

I'm using python and graph-tool for analysing the graph, which seems to be a good choice, because several tests indicate that graph-tool is among the world's top performance graph libraries, esp. for python. I prefer to use python for this. In particular, others were not fast enough (networkx) or had bad time complexity (edge deletion in igraph) and hence didn't scale for other tasks for what I need them.

Here are my ideas for approaches:

Idea 1:
Iterate over all cycles of the graph in the order of ascending cycle length and take note of the nodes therein.

Idea 1.1. (Following up Idea 1.):
There is the function graph_tool.topology.all_circuits, but that doesn't seem to iterate over only simple cycles or not in the order that I want. I don't really know what it does exactly. I just know that I can't use it, because it has exploding time complexity with my graph. E.g. for the small Zachary's karate club with just 34 nodes, it finds 1462130 many cycles. I want to analyse graphs with about 2 million nodes and 4 million edges. I've tried to run graph_tool.topology.all_circuits for such a graph, but this way I didn't even get to see many of the shorter cycles that I'm interested in. I estimate that the cycle search on graphs like the ones I want to analyse would run for several years using this approach. Illustration:

import graph_tool.collection
import graph_tool.topology

g = graph_tool.collection.data["karate"]

print("number of nodes in graph:", g.num_vertices())

print()
print(f"circ_idx\tcircuit")
circuit_iter = graph_tool.topology.all_circuits(g)
i = 0
for circuit in circuit_iter:
    i += 1
    if i % 152371 == 0:
        print(f"{i}\t{circuit}")

print("total number of circuits:", i)

Output:

number of nodes in graph: 34

circ_idx        circuit
152371  [ 0  3  1 13  2 28 31 25 24 27 33 29 23 32  8]
304742  [ 0  7  1 30 32 23 33 27 24 25 31 28  2  3 12]
457113  [ 0  8 30  1 19 33 18 32 29 23 25 24 31]
609484  [ 0  8 33 27 24 25 23 29 32  2  3]
761855  [ 0 13  1  7  3  2 32 31 25 23 33 19]
914226  [ 0 17  1  7  3  2 32 31 24 27 23 29 33 30  8]
1066597 [ 0 19 33 27 24 25 23 29 32 30  1  3]
1218968 [ 0 31 24 27 23 29 26 33 22 32  2  3 12]
1371339 [ 0 31 32 23 25 24 27 33 13  3  2  1 30  8]
total number of circuits: 1462130

Idea 1.2. (Following up Idea 1.):
Implement Idea 1. myself. Doable, but this sounds like quite some work. I don't want to invent the wheel another time. Is there a better way?

Idea 2:
As an alternative to Idea 1., it's also okay to iterate per node over the cycles that a given node is contained in, again in the order of ascending cycle lengths. This can be done with a breadth first search (BFS), starting from the node in question.
With this I can then check membership of a node in said cycles and then move on to the next node. This is less efficient, because I will process cycles repeatedly for every node that is contained in the same cycle, but it will still give reasonable overall time complexity.
The big advantage of this approach is that the implementation will be easier than Idea 1.

Are there better approaches? Does graph-tool offer something that is more well suited for this task?

2

There are 2 answers

0
ravenspoint On

Here is some C++ code that builds a histogram of the number of cycles each node in a graph belongs to.

This uses the depth first search cycle finder in the the PathFinder graph theory library. https://github.com/JamesBremner/PathFinder

Processing the Zachary test data set requires 4 milliseconds to find the 42 cycles under or equal length to 5, while building the histogram takes 2 microseconds.

Here is the main code:

void findCycles()
{
    raven::set::cRunWatch aWatcher("findCycles");

    /* find cycles in graph
    using modified depth first search
    in PathFinder graph theory library
    https://github.com/JamesBremner/PathFinder
    */
   
    vCycle = dfs_cycle_finder(gd);

}
void histogram()
{
    raven::set::cRunWatch aWatcher("histogram");

    nodeCycleCount.resize(gd.g.vertexCount());
    for (auto &vc : vCycle)
    {
        for (int v : vc)
        {
            nodeCycleCount[v]++;
        }
    }
}

main(int argc, char *argv[])
{
    raven::set::cRunWatch::Start();

    if (argc != 2)
    {
        std::cout << "Usage\n";
        exit(1);
    }
    read(argv[1]);

    findCycles();

    std::cout << vCycle.size() << " cycles found\n";

    histogram();

    std::cout << "Node\tCycles\n";
    for (int n = 0; n < nodeCycleCount.size(); n++)
    {
        std::cout << gd.g.userName(n) << "\t" << nodeCycleCount[n] << "\n";
    }

    raven::set::cRunWatch::Report();

    return 0;
}

The complete application code is at https://github.com/JamesBremner/so77686189

Here is the output:

42 cycles found
Node    Cycles
3       17
1       21
2       10
4       7
5       2
6       4
7       4
8       3
9       4
10      1
11      2
12      0
13      1
14      6
17      1
18      1
20      3
22      2
26      3
24      5
25      3
28      3
29      2
30      4
27      1
31      5
32      9
33      24
15      1
16      1
19      1
21      1
23      1
34      25
raven::set::cRunWatch code timing profile
Calls           Mean (secs)     Total           Scope
       1        0.0035151       0.0035151       findCycles
       1        2e-06   2e-06   histogram

0
Stuart Berg On

There is the function graph_tool.topology.all_circuits, but that doesn't seem to iterate over only simple cycles or not in the order that I want. I don't really know what it does exactly. I just know that I can't use it, because it has exploding time complexity with my graph. E.g. for the small Zachary's karate club with just 34 nodes, it finds 1462130 many cycles.

Note that graph-tool and networkx agree on the total count of simple cycles in the karate club graph.

import networkx as nx
g = nx.karate_club_graph()
g = nx.DiGraph(g)
cycles = nx.simple_cycles(g)
num_cycles = sum(1 for c in cycles)

That snippet takes ~1 minute to run and counts 1462130 cycles (same as graph-tool).

I want to analyse graphs with about 2 million nodes and 4 million edges. I've tried to run graph_tool.topology.all_circuits for such a graph, but this way I didn't even get to see many of the shorter cycles that I'm interested in. I estimate that the cycle search on graphs like the ones I want to analyse would run for several years using this approach.

It seems clear that finding all simple cycles is out of the question, then.

Idea 2: As an alternative to Idea 1., it's also okay to iterate per node over the cycles that a given node is contained in, again in the order of ascending cycle lengths. This can be done with a breadth first search (BFS), starting from the node in question.

If you don't need to count cycles above some length N, then a custom search (either BFS or DFS) may be your only option. Although graph-tool does contain tools for customizing a BFS/DFS search via "visitors", you won't benefit from C++ speed, since the visitor will have to call python functions for each edge/node visited.