Varying cluster labels in DBSCAN

Question

Varying cluster labels in DBSCAN

2.9k views Asked by Aravind At 05 January 2017 at 14:33

I am using DBSCAN from sklearn in python to cluster some data points. I am using a precomputed distance matrix to cluster the points.

import sklearn.cluster as cl
C = cl.DBSCAN(eps = 2, metric = 'precomputed', min_samples =2)
db =  C.fit(Dist_Matrix)

Dist_Matrix is precomputed distance matrix I am using. Each time when I run my code, I am getting different cluster labels for the data points. Number of clusters is also varying Like, in the first run,labels are

[ 2  3  3  0  3  0  2  2  2  4  2 -1  0  0  0  1  4  0  1  0  1  3  0  3  0
0  1 -1  0  3  1  3  0  0  2  0  2  0 -1  0  0  3  0  0  0  1  0  1  0  0]

in another run, it is like

[ 0  2  2  1  2  1  0  0  0  3  0 -1  1  1  1  0  3  1  0  1  0  2  1  2  1
1  0 -1  1  2  0  2  1  1  0  1  0  1 -1  1  1  2  1  1  1  0  1  0  1  1]

How can I resolve this? Please help

Original Q&A

There are 2 answers

**Has QUIT--Anony-Mousse** · Answer 1 · 2017-01-06T16:22:50+00:00

Clustering will usually not assign the same labels.

Because the label itself is meaningless. The only valueable information is what objects go together.

As for sklearn, if you use an old version, it will (unnecessarily) randomly shuffle the data. So it's not surprising you get a random permutation of the labels.

Usually, if you require stable labels, you are doing something wrong!

Butif you really know you need that, implement a simple logic: sort clusters by their smallest object, and relabel them accordingly. I.e. the first objects cluster is cluster 0. The second objects cluster (unless it is the same) is cluater 1, and so forth.

**Dammio** · Answer 2 · 2020-11-23T01:12:56+00:00

You can use a custom function to normalize the cluster labels.

def normalize_cluster_labels(labels):
     min_value = min(labels)
     if (min_value < 0):
         labels = labels + abs(min(labels)) # normalize indexes
         #idx = clustering.labels_ - min(clustering.labels_ )
 
     return labels

TechQA.

Varying cluster labels in DBSCAN

There are 2 answers

Related Questions in PYTHON

Related Questions in DBSCAN

Popular Questions

Popular Tags

Trending Questions