I have a set of data distributed on a sphere and I am trying to understand what metrics must be given to the function DBSCAN distributed by scikit-learn. It cannot be the Euclidean metrics, because the metric the points are distributed with is not Euclidean. Is there, in the sklearn packet, a metric implemented for such cases or is dividing the data in small subsets the easiest (if long and tedious) way to proceed?
P.S. I am a noob at python
P.P.S. In case I "precompute" the metric, in what form do I have to submit my precomputed data? Like this?
0 - event1 - event2 - ...
event1 - 0 - distance(event1,event2) - ...
event2 - distance(event1,event2) - 0
Please, help?
Have you tried
metric="precomputed"
?Then pass the distance matrix to the
DBSCAN.fit
function instead of the data.From the documentation: