I am trying to fit OPTICS clustering model to my data using python's sklearn
from sklearn.cluster import OPTICS, cluster_optics_dbscan
from sklearn.preprocessing import StandardScaler
x = StandardScaler().fit_transform(data.loc[:, features])
op = OPTICS(max_eps=20, min_samples=10, xi=0.1)
op = op.fit(x)
From this fitted model, I get the reachability distances (op.reachability_
) and the ordering (op.ordering_
) of the points and also the cluster labels (op.labels_
)
Now, I want to check how the clusters would vary by changing the parameter xi
(0.01 in this case). Can I do this without fitting the model again and again with different xi
's (which takes a lot of time)?
Or, in other words, is there a scikit-learn
function that takes the reachability distances (op.reachability_
), the ordering (op.ordering_
) of the points and xi
as input and outputs the cluster labels?
I found a function cluster_optics_dbscan
which "performs DBSCAN extraction for an arbitrary epsilon given reachability-distances, core-distances and ordering and epsilon" (Not quite what I want)
A priori, you need to call the fit method, which is doing the actual cluster computation, as stated in the function description.
However, if you look at the optics class, the
cluster_optics_xi
function "automatically extract clusters according to the Xi-steep method", calling both the_xi_cluster
and_extract_xi_labels
functions, which both take thexi
parameter as input. So, by using them and refactoring a bit, you may be able to achieve what you want.