I am doing a project by using k-means and my professor suggested kdtree. I found this implementation of kdtree in python (i know that there is also in scipy, but i couldn't find any sample implementation). My question is the same as the tittle, is it used kdtree for speeding up k-means, or i am wrong?
data = [(2,2),(1,0),(2,3),(10,5),(59,8),(4,2)]
tree = KDTree.construct_from_data(data)
nearest = tree.query(query_point=(5,4), t=3)
print nearest
output:
[(4, 2), (2, 3), (2, 2)]
As "Making k-means even faster", p 137, paper shows, kd-tree can be used for k means algorithm speed up for low-dimensional data, while straightforward Lloyd's algorithm is more efficient for higher dimension.
See explanations in paper.
I suggest you to use one of established k-means implementation and worry about speed improvement only if you run into severe issues. For example, afaik, sklearn's KMeans is based on Lloyd's original algorithm.