I'm trying to use DBSCANClusterer from apache.commons.math3.ml.clustering. Function cluster returns list of clusters but for me size of list is always 0. What am I doing wrong? Below is my test code:
public class ClusterTest {
public static void main(String[] args) throws FileNotFoundException, IOException {
DBSCANClusterer dbscan = new DBSCANClusterer(.05, 15);
List<DoublePoint> points = getData();
List<Cluster<DoublePoint>> cluster = dbscan.cluster(points);
for(Cluster<DoublePoint> p : cluster)
System.out.println(p.getPoints().toString());
}
private static List<DoublePoint> getData() throws FileNotFoundException, IOException {
List<DoublePoint> data = new ArrayList<DoublePoint>();
BufferedReader reader = new BufferedReader(new FileReader(new File("clust.txt")));
String line;
double[] d = new double[2];
while ((line = reader.readLine()) != null) {
try {
String[] l = line.split("\t");
d[0] = Double.parseDouble(l[0]);
d[1] = Double.parseDouble(l[1]);
data.add(new DoublePoint(d));
} catch (Exception e) { }
}
return data;
}
}
File clust.txt contains two columns with X and Y values separated with tabulator. I tried with a few different data and I always get 0.
Try the version in ELKI instead. Apache commons math is unfortunately not very good. I moved away from commons-math because of various small issues. ELKI works much better for me.
From a quick look, commons-math is still pretty dead when it comes to cluster analysis... it was last touched for MATH-917. The DBSCAN code there is still quite inefficient. In the previous version, DBSCAN was using all deprecated classes. But it has received like 4 commits over x years.
If you don't get any clusters, you probably have a too small epsilon, and a too high value of minPts... and the commons-math implementation of DBSCAN loses all noise objects - which is what you probably are getting: all noise.