I am apply TSNE for dimensionality reduction. I have several features that I reduce to 2 features. After, I use Kmeans to cluster the data. Finally, I use seaborn to plot the clustering results.
To import TSNE I use:
from sklearn.manifold import TSNE
To Apply TSNE I use :
features_tsne_32= TSNE(2).fit_transform(standarized_data)
After that I use Kmeans:
kmeans = KMeans(n_clusters=6, **kmeans_kwargs)
kmeans.fit(features_tsne_32)
km_tsne_32 = kmeans.predict(features_tsne_32)
Finally, I have the plot by using:
import seaborn as sns
#plot data with seaborn
facet = sns.lmplot(data=df, x='km_tsne_32_c1', y='km_tsne_32_c2', hue='km_tsne_32',
fit_reg=False, legend=True, legend_out=True)
I have this plot:
This plot seems to be too perfect and globular it is something wrong with the procedure I follow to plot this data? in the code describe above?
Your problem is not specific to t-SNE, but rather to any unsupervised learning algorithm. How do you evaluate its results?
I would say that the only proper way to do this is if you have some prior or expert knowledge on the data. Something like labels, other metadata, even user feedback.
That being said, regarding your specific plot:
So k-Means is fine, but you probably need to tweak the parameters of t-SNE.