Implementing Disjoint Set Data Structure in Python

1.3k views Asked by At

I'm working on a small project involving cluster, and I think the code given here https://www.ics.uci.edu/~eppstein/PADS/UnionFind.py might be a good starting point for my work. However, I have come across a few difficulties implementing it to my work:

  1. If I make a set containing all my clusters cluster=set([0,1,2,3,4,...,99]) (there are 100 points with the numbers labelling them), then I would like to to group the numbers into cluster, do I simply write cluster=UnionFind()? Now what is the data type of cluster?

  2. How can I perform the usual operations for set on cluster? For instance, I would like to read all the points (which may have been grouped together) in cluster, but type print cluster results in <main.UnionFind instance at 0x00000000082F6408>. I would also like to keep adding new elements to cluster, how do I do it? Do I need to write the specific methods for UnionFind()?

  3. How do I know all the members of a group with one of its member is called? For instance, 0,1,3,4 are grouped together, then if I call 3, I want it to print 0,1,3,4, how do I do this?

thanks

1

There are 1 answers

0
Hesham Attia On BEST ANSWER

Here's a small sample code on how to use the provided UnionFind class.

Initialization

The only way to create a set using the provided class is to FIND it, because it creates a set for a point only when it doesn't find it. You might want to create an initialization method instead.

union_find = UnionFind()
clusters = set([0,1,2,3,4])
for i in clusters:
    union_find[i]

Union

# Merge clusters 0 and 1
union_find.union(0, 1)
# Add point 2 to the same set
union_find.union(0, 2)

Find

# Get the set for clusters 0 and 1
print union_find[0]
print union_find[1]

Getting all Clusters

# print all clusters and their sets
for cluster in union_find:
    print cluster, union_find[cluster]



Note:

There is no direct way that gets you all the points given a cluster number. You can loop over all the points and pick the ones that have the required cluster number. You might want to modify the given class to support that operation more efficiently.