Here is some code to set up the clustering problem:
import numpy as np
import matplotlib.pyplot as plt
# KMeans
# # Class=2
# Center(2.5,2.5), r1 = 2, r2 = 1
X1 = np.zeros(500*4)
X2 = np.zeros(500*4)
r1 = 2; r2 = 1; a = 2.5; b = 2.5 # generate circle
h = np.random.uniform(0, 2*np.pi, 1000)
noise = np.random.normal(0, 0.1, 1000)
X1[:1000] = np.cos(h) * r1 + a + noise
noise = np.random.normal(0, 0.1, 1000)
X2[:1000] = np.sin(h) * r1 + a + noise
h = np.random.uniform(0, 2*np.pi, 1000)
noise = np.random.normal(0, 0.1, 1000)
X1[1000:] = np.cos(h) * r2 + b + noise
noise = np.random.normal(0, 0.1, 1000)
X2[1000:] = np.sin(h) * r2 + b + noise
X = np.array([X1,X2]).T
plt.figure(figsize=(4,4))
plt.scatter(X[:,0],X[:,1])
From the following image, we assume that there are two clusters. All points in the inner circle should belong to one, and the outer circle should belong to another.
By scikit-learn, we have this code with RBF kernel:
from sklearn.cluster import SpectralClustering
clustering = SpectralClustering(n_clusters=2,assign_labels='kmeans', affinity='rbf',random_state=0).fit(X)
print(clustering.labels_)
plt.figure(figsize=(4,4))
X_C1 = np.array([X[i,:] for i in range(len(clustering.labels_)) if clustering.labels_[i] == 1])
X_C2 = np.array([X[i,:] for i in range(len(clustering.labels_)) if clustering.labels_[i] == 0])
plt.scatter(X_C1[:,0],X_C1[:,1],c="blue")
plt.scatter(X_C2[:,0],X_C2[:,1],c="red")
plt.show()
But it seems that the spectral clustering doesn't work (as bad KMeans clustering). So what is the problem here?
The default
gamma=1.0
parameter is not high enough for this application.Try
gamma=6.0
: