Is kdtree used for speeding k-means clustering or not?

Question

Is kdtree used for speeding k-means clustering or not?

2.7k views Asked by Xenor At 14 December 2013 at 20:13

I am doing a project by using k-means and my professor suggested kdtree. I found this implementation of kdtree in python (i know that there is also in scipy, but i couldn't find any sample implementation). My question is the same as the tittle, is it used kdtree for speeding up k-means, or i am wrong?

data = [(2,2),(1,0),(2,3),(10,5),(59,8),(4,2)]

tree = KDTree.construct_from_data(data)
nearest = tree.query(query_point=(5,4), t=3)
print nearest

output:

[(4, 2), (2, 3), (2, 2)]

Original Q&A

There are 2 answers

Has QUIT--Anony-Mousse On 15 December 2013 at 12:07

It can be used, but it's nontrivial. Most people only implement the straightforward un-accelerated solution.

The problem is that most kd-tree implementations only support nearest-neighbor queries.

This only pays off if you have a large number of clusters k, and build the index on the clusters.

For full kd-tree-k-means acceleration, you would need to implement a bipartite NN-join, where you would both have an index on the points and on the cluster centers. I'm not aware of any kd-tree implementation that would support this.

**alko** · Accepted Answer · 2013-12-14T21:58:25+00:00

As "Making k-means even faster", p 137, paper shows, kd-tree can be used for k means algorithm speed up for low-dimensional data, while straightforward Lloyd's algorithm is more efficient for higher dimension.

With high-dimensional data, indexing schemes such as k-d tree do not work well

See explanations in paper.

I suggest you to use one of established k-means implementation and worry about speed improvement only if you run into severe issues. For example, afaik, sklearn's KMeans is based on Lloyd's original algorithm.

TechQA.

Is kdtree used for speeding k-means clustering or not?

There are 2 answers

Related Questions in PYTHON

Related Questions in CLUSTER-ANALYSIS

Related Questions in K-MEANS

Related Questions in KDTREE

Popular Questions

Popular Tags

Trending Questions