I tried to follow this tutorial on using ELKI with pre-computed distances for clustering.
http://elki.dbs.ifi.lmu.de/wiki/HowTo/PrecomputedDistances
I used the following set of command line options:
-dbc.filter FixedDBIDsFilter -dbc.startid 0 -algorithm clustering.OPTICS
-algorithm.distancefunction external.FileBasedDoubleDistanceFunction
-distance.matrix /path/to/matrix -optics.minpts 5 -resulthandler ResultWriter
ELkI fails with a configuration error saying db.in file is needed to make the computation.
The following configuration errors prevented execution:
No value given for parameter "dbc.in":
Expected: The name of the input file to be parsed.
No value given for parameter "parser.distancefunction":
Expected: Distance function used for parsing values.
My question is what is db.in file? Why should I provide it in addition to the distance matrix file since the pair-wise distance matrix file completely specifies all the information about the point cloud. (also I don't have access to any other information other than the pair-wise distance information).
What should I do about db.in? Should I override it, or specify some dummy information etc. Kindly help me understand.
thank you.
This is documented in the ELKI HowTos:
http://elki.dbs.ifi.lmu.de/wiki/HowTo/PrecomputedDistances
However, there is a bug (patch is on the howto page, and will be in the next release) so you right now can't fully use this; as a workaround you can use a text file that enumerates the objects.
The reason for this is that ELKI is designed to work on multi-relational data. It's not just processing matrixes. But some algorithms may e.g. need a geographic representation of an object, some measurements for this object, and a label for evaluation. That is three relations.
What the DBIDRange data source essentially does is create a single "fake" relation that is just the DBIDs 0 to 99. On algorithms that don't need actual data, but only distances (e.g. LOF or DBSCAN or OPTICS), it is sufficient to have object IDs and a distance matrix.