Assume you have many elements, and you need to keep track of the equivalence relations between them. If element A is equivalent to element B, it is equivalent to all the other elements B is equivalent to.
I am looking for an efficient data structure to encode this information. It should be possible to dynamically add new elements through an equivalence with an existing element, and from that information it should be possible to efficiently compute all the elements the new element is equivalent to.
For example, consider the following equivalence sets of the elements [0,1,2,3,4]:
0 = 1 = 2
3 = 4
where the equality sign denotes equivalence. Now we add a new element 5
0 = 1 = 2
3 = 4
5
and enforcing the equivalence 5=3
, the data structure becomes
0 = 1 = 2
3 = 4 = 5
From this, one should be able to iterate efficiently through the equivalence set for any element. For 5, this set would be [3,4,5].
Boost already provides a convenient data structure called disjoint_sets
that seems to meet most of my requirements. Consider this simple program that illustates how to implement the above example:
#include <cstdio>
#include <vector>
#include <boost/pending/disjoint_sets.hpp>
#include <boost/unordered/unordered_set.hpp>
/*
Equivalence relations
0 = 1 = 2
3 = 4
*/
int main(int , char* [])
{
typedef std::vector<int> VecInt;
typedef boost::unordered_set<int> SetInt;
VecInt rank (100);
VecInt parent (100);
boost::disjoint_sets<int*,int*> ds(&rank[0], &parent[0]);
SetInt elements;
for (int i=0; i<5; ++i) {
ds.make_set(i);
elements.insert(i);
}
ds.union_set(0,1);
ds.union_set(1,2);
ds.union_set(3,4);
printf("Number of sets:\n\t%d\n", (int)ds.count_sets(elements.begin(), elements.end()));
// normalize set so that parent is always the smallest number
ds.normalize_sets(elements.begin(), elements.end());
for (SetInt::const_iterator i = elements.begin(); i != elements.end(); ++i) {
printf("%d %d\n", *i, ds.find_set(*i));
}
return 0;
}
As seen above one can efficiently add elements, and dynamically expand the disjoint sets. How can one efficiently iterate over the elements of a single disjoint set, without having to iterate over all the elements?
Most probably you can't do that,
disjoint_sets
doesn't support iteration over one set only. The underlying data structure and algorithms wouldn't be able to do it efficiently anyway, i.e. even if there was support built in todisjoint_sets
for iteration over one set only, that would be just as slow as iterating over all sets, and filtering out wrong sets.