KMeans Algorithm Silhouette Score not working. I get the error :

583 views Asked by At

I run this code, and my RFM_scaleddf has float values for recency, frequency and monetary values.

K_clusters = [2,3,4,5,6,7,8,9,10]
silhouette_scores = []

for K in K_clusters:
    initialised_clusters = KMeans(n_clusters = K, max_iter = 30, random_state = 10)
    initialised_clusters.fit(RFM_scaleddf)
    cluster_labels = initialised_clusters.labels_
    score = silhouette_score(RFM_scaleddf, cluster_labels, random_state = 10)
    silhouette_scores.append(score)

The difference is that with RFM dataset, because it has float values, the sklearn silhouette score is not working.

K_clusters = [2,3,4,5,6,7,8,9,10]
silhouette_scores = []

for K in K_clusters:
    initialised_clusters = KMeans(n_clusters = K, max_iter = 30, random_state = 10)
    initialised_clusters.fit(RFM_scaleddf)
    cluster_labels = initialised_clusters.labels_
    score = silhouette_score(RFM_scaleddf, cluster_labels, random_state = 10)
    silhouette_scores.append(score)
---------------------------------------------------------------------------

    TypeError                                 Traceback (most recent call last)
    Cell In [127], line 8
          6 initialised_clusters.fit(RFM_scaleddf)
          7 cluster_labels = initialised_clusters.labels_
    ----> 8 score = silhouette_score(RFM_scaleddf, cluster_labels, random_state = 10)
          9 silhouette_scores.append(score)
    
    TypeError: 'numpy.float64' object is not callable

1

There are 1 answers

0
Alexander L. Hayes On

The TypeError informing us that numpy.float64 is not callable suggests that silhouette_score is re-defined from an imported function into a float elsewhere in the code.

i.e.: Check the code for something like this:

silhouette_score = silhouette_score(X, labels)

This (scikit-learn>=1.2.0) minimal reproducible example should produce no error:

import numpy as np
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import matplotlib.pyplot as plt

X = np.array([[-0.686,-0.690,-0.434,-0.418,-0.546,-0.506,0.458,0.454,0.610,0.598,0.410,0.922,0.966,0.878,0.778,0.874,-0.914,-0.962,-0.790,-0.790,-0.842,-0.906], [0.415,0.579,0.635,0.495,0.511,0.323,0.399,0.507,0.459,0.263,0.275,-0.937,-0.893,-0.769,-0.861,-0.933,-0.977,-0.857,-0.941,-0.829,-0.869,-0.781]]).T

km = KMeans(n_clusters=4, n_init='auto', random_state=42)
km.fit(X)
labels = km.labels_

fig, ax = plt.subplots(1, 1)
ax.scatter(X[:, 0], X[:, 1], c=labels)
ax.set_title(f"silhouette_score = {silhouette_score(X, labels):.3f}")
plt.show()

Simple example showing a 2D scatter plot with four clusters of yellow, blue, violet, and green points in the four corners. The title of the figure reads silhouette_score = 0.853.

But adding these lines at the end:

silhouette_score = silhouette_score(X, labels)
silhouette_score = silhouette_score(X, labels)

Raises:

Traceback (most recent call last):
  File "/home/hayesall/answer.py", line 15, in <module>
    silhouette_score = silhouette_score(X, labels)
TypeError: 'numpy.float64' object is not callable