Find the number of disjoint sets

8.7k views Asked by At

For those not familiar with Disjoint-set data structure.

https://en.wikipedia.org/wiki/Disjoint-set_data_structure

I'm trying to find the no. of groups of friends from the given sets of friends and their relationships. Of course, there is no doubt that this could easily be implemented using BFS/DFS. But I choose to use disjoint set, I also tend to find the friend group the person belongs, etc, and disjoint-set certainly sounds to be appropriate for that case.

I have implemented the Disjoint set data structure, Now I need to find the number of disjoint sets it contains(which will give me the No. of groups).

Now, I'm stuck at implementing on how to find the No. of disjoint-sets efficiently, as the number of friends can be as large as 1 00 00 0.

Options that I think should work.

  1. Attach the new set at the back of the original, and destroy the old set.

  2. Change their parents of each element at every union.

But since the number of friends are huge, I'm not sure if that's the correct approach, Perhaps if there is any other efficient way or should I go ahead and implement any of the above.

Here is my code for additional details.(I'have not implemented the counting disjoint-set here)

//disjoint set concept 

//https://www.topcoder.com/community/data-science/data-science-tutorials/disjoint-set-data-structures/
// initially all the vertices are takes as single set and they are their own representative.
// next we see, compare two vertices, if they have same parent(representative of the set), we leave it.
// if they don't we merge them it one set.
// finally we get different disjoint sets.

#includes ...
using namespace std;

#define edge pair<int, int>
const int max 1000000;
vector<pair<int, edge > > graph, mst;
int N, M;
int parent[max];

int findset(int x, int* parent){
 //find the set representative.
    if(x != parent[x]){ 
        parent[x] = findset(parent[x], parent);
    }

    return parent[x];
}
void disjoints(){
    for(int i=0; i<M; i++){
        int pu = findset(graph[i].second.first, parent);
        int pv = findset(graph[i].second.second, parent);

        if(pu != pv){ //if not in the same set.
            mst.push_back(graph[i]);
            total += graph[i].first;
            parent[pu] = parent[pv]; // create the link between these two sets
        }
    }
}
 void noOfDisjoints(){
  //returns the No. of disjoint set.
 }
void reset(){
    for(int i=0; i<N; i++){
        parent[i] = i;
    }
}

int main() {
            cin>>N>>M; // No. of friends and M edges
        int u,v,w;    // u= source, v= destination, w= weight(of no use here).  
        reset();
        for(int i =0; i<M ;i++){
            cin>>u>>v>>w;
            graph.push_back(pair<int, edge>(w,edge(u,v)));
        }
        disjoints();
        print();
    return 0;
}
2

There are 2 answers

1
amit On BEST ANSWER

Each union operaiton on two items a,b in Disjoint Set Data Structure has two possible scenarios:

  1. You tried to unite items from the same set. In this case, nothing is done, and number of disjoint sets remain the same.
  2. You united items from two different sets, so you basically converged two sets into one - effectively decreasing the number of disjoint sets by exactly one.

From this, we can conclude that it is easy to find the number of disjoint sets at every moment by tracking the number of unions of type (2) from the above.
If we denote this number by succ_unions, then the total number of sets at each point is number_of_initial_sets - succ_unions.

1
templatetypedef On

If all you need to know is the number of disjoint sets and not what they are, one option would be to add in a counter variable to your data structure counting how many disjoint sets there are. Initially, there are n of them, one per individual element. Every time you perform a union operation, if the two elements don't have the same representative, then you know you're merging two disjoint sets into one, so you can decrement the counter. That would look something like this:

if (pu != pv){ //if not in the same set.
    numDisjointSets--;  // <--- Add this line
    mst.push_back(graph[i]);
    total += graph[i].first;
    parent[pu] = parent[pv]; // create the link between these two sets
}

Hope this helps!