how can I track a specific item presence in hierarchy clustering

203 views Asked by At

I have a question related to hierarchy clustering. I have a relative complex data sets with 2000 items/samples. I cluster the items using scipy and give the clusters different cutoff e.g. from 0.1 -0.9

from scipy.cluster import hierarchy as hac
Z=hac.linkage(distance, single,'euclidean')
results=hac.fcluster(Z, cutoff,'distance')

how can I check/track a certain item say when cutoff is 0.1 in group x, and when the cutoff is 0.2 is in group y. etc

I considered about showing the dendrogram ,but to track 1 item in 2000 samples from a dendrogram would be too messy?

1

There are 1 answers

0
AudioBubble On

Try to build a set of Clusters IDs using set(list(..)) to remove duplicates, then go through the elements and filter your data depends on the cluster where they belong. Give it a try, as you didn't give a sample of data to test it.

Your code would look like:

clusterIDs = set(list(results))
D= {} # Dictinary where you store ClusterID: [list of points that belong to that cluster]
for i, clusterID in enumerate(clusterIDs):
  clusterItems = data[np.where(results == clusterID)]
  D[clusterID]=clusterItems