My naive maximal clique finding algorithm runs faster than Bron-Kerbosch's. What's wrong?

1.7k views Asked by At

In short, my naive code (in Ruby) looks like:

# $seen is a hash to memoize previously seen sets
# $sparse is a hash of usernames to a list of neighboring usernames
# $set is the list of output clusters

$seen = {}
def subgraph(set, adj)
    hash = (set + adj).sort
    return if $seen[hash]
    $sets.push set.sort.join(", ") if adj.empty? and set.size > 2
    adj.each {|node| subgraph(set + [node], $sparse[node] & adj)}
    $seen[hash] = true
end

$sparse.keys.each do |vertex|
    subgraph([vertex], $sparse[vertex])
end

And my Bron Kerbosch implementation:

def bron_kerbosch(set, points, exclude)
    $sets.push set.sort.join(', ') if set.size > 2 and exclude.empty? and points.empty?
    points.each_with_index do |vertex, i|
        points[i] = nil
        bron_kerbosch(set + [vertex],
                      points & $sparse[vertex],
                      exclude & $sparse[vertex])
        exclude.push vertex
    end
end

bron_kerbosch [], $sparse.keys, []

I also implemented pivoting and degeneracy ordering, which cut down on bron_kerbosch execution time, but not enough to overtake my initial solution. It seems wrong that this is the case; what algorithmic insight am I missing? Here is a writeup with more detail if you need to see fully working code. I've tested this on pseudo-random sets up to a million or so edges in size.

1

There are 1 answers

0
Thomash On BEST ANSWER

I don't know how you generate the random graphs for your tests but I suppose you use a function which generates a number according to a uniform distribution and thus you obtain a graph that is very homogeneous. That's a common problem when testing algorithms on graphs, it is very difficult to create good test cases (it's often as hard as solving the original problem).

The max-clique problem is a well-known NP hard problem and both algorithms (the naive one and the Bron Kerbosch one) have the same complexity so we can't expect a global improvement on all testcase but just an improvement on some particular cases. But because you used a uniform distribution to generate your graph, you don't have this particular case.

That's why performance of both algorithms is very similar on your data. And because Bron Kerbosch algorithm is a little more complex than the naive one, the naive one is faster.