how come there are overlaps in my clusters ? Is it because sklearn.KMeans finish the iterations too soon ? graph: kmeans cluster sepal iris flower
some clarifications:
- the data is 4D, values are standardized (@OmG pointed the answer to my question)
- I've uploaded 3 files here : github repository
- code.py - minimum for this question
- code_notebook.ipynb same as code.py + other functions
- iris-dataset.csv : the dataset
because the example I was working on, always plotted only the first 2 columns, I thought I was running the clustering on only those two variables. Thanks for pointing the answer to the question !