Clustering based on pearson correlation

Question

Clustering based on pearson correlation

3.7k views Asked by Bankelaal At 11 June 2015 at 10:36

I have a use case where I have traffic data for every 15 minutes for 1 month. This data is collected for various resources in netwrok.

Now I need to group resources which are similar(based on traffic usage pattern over 00 hours to 23:45 hrs).

One way to check if two resources have similar traffic behavior is that I can use Pearson correlation coefficient for all the resources and create N*N matrix.

My question is which method I should apply to cluster the similar resources ? Existing methods in K-Means clustering are based on euclidean distance. Which algorithm I can use to cluster based on similarity of pattern ?

Any thoughts or link to possible solution is welcome. I want to implement using Java.

Original Q&A

There are 1 answers

**Has QUIT--Anony-Mousse** · Accepted Answer · 2015-06-11T17:02:23+00:00

Pearson correlation is not compatible with the mean. Thus, k-means must not be used - it is proper for least-squares, but not for correlation.

Instead, just use hierarchical agglomerative clustering, which will work with Pearson correlation matrixes just fine. Or DBSCAN: it also works with arbitary distance functions. You can set a threshold: an absolute correlation of, e.g. +0.75, may be a desireable value of epsilon. But to get a feeling of your distance function, dendrograms as used by HAC are probably easier.

Beware that Pearson is not defined for constant patterns. If you have a resource with 0 usage, your distance will be undefined.

TechQA.

Clustering based on pearson correlation

There are 1 answers

Related Questions in CLUSTER-ANALYSIS

Related Questions in DATA-MINING

Related Questions in K-MEANS

Related Questions in HIERARCHICAL-CLUSTERING

Related Questions in DBSCAN

Popular Questions

Popular Tags

Trending Questions