How to remove noise using MeanShift Clustering Technique?

2k views Asked by At

I'm using meanshift clustering to remove unwanted noise from my input data.. Data can be found here. Here what I have tried so far..

import numpy as np
from sklearn.cluster import MeanShift
data = np.loadtxt('model.txt', unpack = True) 
## data size is [3X500]
ms = MeanShift()
ms.fit(data)

after trying some different bandwidth value I am getting only 1 cluster.. but the outliers and noise like in the picture suppose to be in different cluster.

enter image description here

when decreasing the bandwidth a little more then I ended up with this ... which is again not what I was looking for.

enter image description here

Can anyone help me with this?

3

There are 3 answers

6
Has QUIT--Anony-Mousse On

Mean-shift is not meant to remove low-density areas.

It tries to move all data to the most dense areas.

If there is one single most dense point, then everything should move there, and you get only one cluster.

Try a different method. Maybe remove the outliers first.

1
fferri On

You can remove outliers before using mean shift.

Statistical removal

For example, fix a number of neighbors to analyze for each point (e.g. 50), and the standard deviation multiplier (e.g. 1). All points who have a distance larger than 1 standard deviation of the mean distance to the query point will be marked as outliers and removed. This technique is used in libpcl, in the class pcl::StatisticalOutlierRemoval, and a tutorial can be found here.

enter image description here

Deterministic removal (radius based)

A simpler technique consists in specifying a radius R and a minimum number of neighbors N. All points who have less than N neighbours withing a radius of R will be marked as outliers and removed. Also this technique is used in libpcl, in the class pcl::RadiusOutlierRemoval, and a tutorial can be found here.

enter image description here

0
user2987264 On

set his parameter to false cluster_allbool, default=True If true, then all points are clustered, even those orphans that are not within any kernel. Orphans are assigned to the nearest kernel. If false, then orphans are given cluster label -1.