Finding intersections between a large number of sets with unique elements each

60 views Asked by At

The following is an interesting algorithmic problem.

Inputs:

  • The set of elements S (say, non-negative integers).
  • A mapping N that maps some elements from S to a subset S_k \subset S. (not all elements of S are represented, neither in keys nor in values). Each S_k consists of unique, sorted elements only.

Example inputs:


S = {0, 1, 2, 3, 4, 5, 6, 7, 8}
N = {
   0: [1, 2, 3],
   4: [2, 3, 5, 6],
   6: [5, 7, 8],
}

Let us denote by N[s] the set of elements corresponding to s (e.g., in the example above, N[0] = [1, 2, 3]).

The goal

Build a mapping M that maps keys in N to subsets of keys in N. The mapping should be of the form s_j: S_j where s_j \in S, S_j \subset S, and any s is included in the S_j if N[s_j] and N[s] have common elements.

Example result:


{
   0: [4],
   4: [0, 6],
   6: [4],
}

The parameters are as follows:

  • length of N ~100K elements
  • is the number of elements in the correspondence (|N[s]|) ~10
  • the number of unique elements (|S|) ~100K

The following is my attempt at solving the task. It starts with building an inverted index, then iterating it to construct the result. Should be O(|N|) complexity but I'm unsure.


def merge_sets(sets: Sequence[Set]) -> Set:
    """List of sets in, a single merged set out. """
    if not sets:
        return set()
    return reduce(or_, sets)


def neighbors_to_mutexes(
        indexes: Union[List[int], np.ndarray],
        neighbors_by_index: Union[List[List[int]], np.ndarray],
) -> Mapping[int, Set[int]]:
    """Given a list of indexes and a list of respective neighbors for each index,
    find indexes whose neighbor lists share at least one element."""

    inclusion_by_index = defaultdict(set)  # which sets of neighbours is each index included in
    neighbors_by_index = [set(neighbors) for neighbors in neighbors_by_index]
    global_indexes = merge_sets(neighbors_by_index)
    for index in global_indexes:
        inclusion_by_index[index] = {
            other_index
            for other_index, neighbors in zip(indexes, neighbors_by_index)
            if index in neighbors
        }

    intersecting = defaultdict(set)
    for index, including_set in inclusion_by_index.items():
        for i, j in product(including_set, including_set):
            if i != j:
                intersecting[i].add(j)
                intersecting[j].add(i)

    return intersecting
1

There are 1 answers

0
user2390182 On

Kind of a graph problem with a bipartite graph, where each value in the lists is an intermadiate node on the way to other keys in N.

graph = {k: set(v) for k,v in N.items()}
for k, vals in N.items():
    for val in vals:
        graph.setdefault(val, set()).add(k)
 
result = {}
for k in N:
    for val in graph[k]:
        for k2 in graph[val]:
            if k2 in N and k != k2:
                result.setdefault(k, set()).add(k2)
 
result
# {0: {0, 4}, 4: {0, 4, 6}, 6: {4, 6}}