Outlier dectection Using ELKI

1k views Asked by At

I am use ELKI data mining software for outlier detection. It have many outliers detection techniques but all provides same results(same outliers with all techniques the only difference is in the size of the circle around the points as shown in figures below). I uses the mouse head dataset provided on the ELKI website. In data-set all the points are labeled with its respective cluster name, whether its is from ear_left or ear_right or head or noise. If i change the label of noise to the ear_right, it then shows that outlier point as ear_right. i have change 5 out of 10 noise label to ear_right.

here is the result of using KNN and LDOF outlier detection technique with modified data-set and in ELKI:

enter image description here

Is it a problem with the software or i am doing something wrong? have anyone tried it using for outlier detection? Is there any good software which can perform outlier detection using different algorithms like LOF, LDOF , KNN or where i could find algorithm source code for these techniques?

1

There are 1 answers

0
Erich Schubert On BEST ANSWER

This is a very simplistic data set.

It is not surprising that the methods all work more or less good. Because this is a toy data set, not real data... on real data, outlier detection is much, much harder.

Note that the implementations in ELKI assign numerical scores. They do not produce a yes/no outlier decision; this is trivial to derive from the scores.

If you want a binary result, you can for example set the visualization scaling parameter to only visualize the top k results. In other cases, you may want to read the actual papers. For example, the authors of LOCI suggest to treat objects with a score larger than 3 as outliers. (Unfortunately, most methods do not have a particular easy interpretation available.)

Don't think in the classification box. Outlier detection is an explorative technique, not classification.

ELKI can also evaluate the quality of the outlier method using a number of measures, such as ROC AUC, ROC curves, Precision@k, AveP, Maximum-F1.