Minimal `set cover` solution in Clojure

240 views Asked by At

I've been trying ways to distill (a lot of) database index suggestions into a set of indexes that are applicable to most databases. To do that it turns out I need to solve a pretty basic but NP complete set theory problem: the minimum set cover problem.

This means that given a set of sets, select a subset of sets s that covers a certain domain u, but if u isn't given make it the union of s. The most optimal subset of sets is the one reaching a certain minimum, usually the minimal amount of sets, but it could also be a minimum in total weight if the sets are weighted.

(def s #{#{1 4 7} #{1 2} #{2 5 6} #{2 5} #{3} #{8 6}})
(def u (apply set/union s))
(set-cover s u)
=> (#{7 1 4} #{6 2 5} #{3} #{6 8})

I implemented a naive version of this by using clojure.math.combinatorics, relying on it returning subsets in order of increasing amounts of sets.

(defn set-cover
  ([s]
   (set-cover s (apply set/union s)))
  ([s u]
   (->> s
        (combo/subsets)
        (filter (fn [s] (= u (apply set/union s))))
        first)))

However this is very slow on larger s, because of the NP nature and the recurring unions (even optimized ones). For my use-case a version supporting weighted sets would also be preferable.

Looking into optimized versions most trails ended up in thesis-land, which I'm regrettably not smart enough for. I found this small python implementation on SO

def setCover(setList,target=None):
    if not setList: return None
    if target  is None: target  = set.union(*setList)
    bestCover = []
    for i,values in enumerate(setList):
        remaining = target - values
        if remaining == target: continue
        if not remaining: return [values]
        subCover = setCover(setList[i+1:],remaining)
        if not subCover: continue
        if not bestCover or len(subCover)<len(bestCover)-1:
            bestCover = [values] + subCover
    return bestCover

It ticks many boxes:

  • work recursively
  • compares partial results as optimization
  • seems suitable for different minimum definitions: count or weight
  • has additional optimizations I can grok
    • which can be done outside of the basic algorithm
      • sorting input sets on high minimum score (size, weight)
      • identifying unique singleton sets in u not found in other sets

I have been trying to translate this into Clojure as a loop-recur function, but couldn't get the basic version of it to work, since there are niggling paradigm gaps between the two languages.

Does anyone have suggestions how I could go about solving this problem in Clojure, either by tips how to convert the python algorithm successfully, or which other Clojure (or even Java) libraries I could use and how ?

1

There are 1 answers

1
Steffan Westcott On BEST ANSWER

Here's a Clojure version of the greedy set cover algorithm i.e. selects a set which covers the most uncovered elements at each iteration. Rather than use loop/recur to build the complete result, it lazily returns each result element using lazy-seq:

(defn greedy-set-cover
  ([xs]
   (greedy-set-cover xs (apply set/union xs)))
  ([xs us]
   (lazy-seq
     (when (seq us)
       (let [x (apply max-key #(count (set/intersection us %)) xs)
             xs' (disj xs x)
             us' (set/difference us x)]
         (cons x (greedy-set-cover xs' us')))))))
(def s #{#{1 4 7} #{1 2} #{2 5 6} #{2 5} #{3} #{8 6}})

(greedy-set-cover s)   ;; = (#{7 1 4} #{6 2 5} #{6 8} #{3})