mean shift algorithm for clustering user defined data consisting 3-4 features

1.7k views Asked by At

I want to do cluster the data which consists of object names, x_coordinate, y_coordinate and corresponding temperature. Trying mean square clustering algorithm for clustering the nearby object according to location and the nearby temperature i.e. identify hot and cold areas. Following is code and small sample data. but it gives only single cluster by default settings but cannot show graph. I would like to know what might be wrong in following code:

import numpy as np  
from mpl_toolkits.mplot3d import Axes3D  
import pandas as pd  
from sklearn.decomposition import PCA    
from sklearn.cluster import MeanShift, estimate_bandwidth  
import matplotlib.pyplot as plt  
from itertools import cycle  

data = pd.read_csv("data.csv")

centers = [[1, 1, 1], [0,0,0], [0,0,0]]  
X= data._get_numeric_data()  
bandwidth = estimate_bandwidth()  

ms = MeanShift()  
ms.fit(X)  
labels = ms.labels_  
cluster_centers = ms.cluster_centers_  

print labels  
print cluster_centers  

fig = plt.figure()  
ax = plt.axes(projection='3d')  
x = data['x_cordinate']  
y=data['y_cordinate']  
z=data['tpa']  
c=labels  
ax.scatter(x,y,z, c=c)  
plt.show()  

Data.csv :

name,x_cordinate,y_cordinate,temperature
Ctrs3,5189200,6859000,0.3998434286
Ctrs4,5173360,6812800,0.4779542857
Ctrs5,5660440,6812800,0.7044195918
Cstrs3,1935400,5929720,0
Cstrs4,1953880,5929720,0
Cstrs5,491320,2689120,0
Cltrs3,3436240,5884840,0.3998434286
Cltrs4,3296320,5884840,0.4779542857
Cltrs5,5426800,5725120,0.7044195918

1

There are 1 answers

3
welch On

estimate_bandwidth needs an argument (your data). Does this code run?

Anyway... when this happens to me, I give smaller values of the quantile parameter for estimate_bandwidth than the default 0.3 (and pass that bandwidth estimate to the MeanShift constructor!).

You may also know a good bandwidth a-priori and are best using that if you do.