How much matrix size the function Spectral clustering of Scikit learn can handle?

Question

How much matrix size the function Spectral clustering of Scikit learn can handle?

3.3k views Asked by Gunjan naik At 02 January 2017 at 15:03

I am using function 'Spectral clustering' of scikit learn. I am able to perform clustering for 8100 by 8100 matrix but this function throws error for 10000 by 10000 matrix.

Has anyone used this function for large matrix?

edit: I got following error message:

    Not enough memory to perform factorization.
    Traceback (most recent call last):
    File "combined_code_img.py", line 287, in <module>
    labels=spectral.fit_predict(Affinity)
    File "/root/anaconda/lib/python2.7/site-packages/sklearn/base.py",        
    line 410, in fit_predict
    self.fit(X)
    File "/root/anaconda/lib/python2.7/site-packages/sklearn/cluster/spectral.py", line 463, in fit
    assign_labels=self.assign_labels)
    File "/root/anaconda/lib/python2.7/site-packages/sklearn/cluster/spectral.py", line 258, in spectral_clustering
    eigen_tol=eigen_tol, drop_first=False)
    File "/root/anaconda/lib/python2.7/site-packages/sklearn/manifold/spectral_embedding_.py", line 265, in spectral_embedding
tol=eigen_tol, v0=v0)
    File "/root/anaconda/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1560, in eigsh
symmetric=True, tol=tol)
    File "/root/anaconda/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1046, in get_OPinv_matvec
    return SpLuInv(A.tocsc()).matvec
    File "/root/anaconda/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 907, in __init__
    self.M_lu = splu(M)
    File "/root/anaconda/lib/python2.7/site-packages/scipy/sparse/linalg/dsolve/linsolve.py", line 261, in splu
ilu=False, options=_options)
    MemoryError

My machine is having 16 GB ram.

Original Q&A

There are 1 answers

**rth** · Answer 1 · 2017-01-07T10:28:55+00:00

Spectral clustering algorithm has ~ O(n³) time complexity, and a fairly bad space complexity, since you are running out for memory with 16 GB RAM to process a ~0.8 GB dataset (10000x10000 array, assuming 64bit floats). It is therefore not suitable for large datasets.

Instead you should use a clustering algorithm that scales better. See the below benchmarks taken from the HDBSCAN documentation,

For instance, MiniBatchKMeans, DBSCAN from scikit-learn or HDBSCAN would scale better.

TechQA.

How much matrix size the function Spectral clustering of Scikit learn can handle?

There are 1 answers

Related Questions in MACHINE-LEARNING

Related Questions in SCIKIT-LEARN

Related Questions in CLUSTER-ANALYSIS

Related Questions in SPECTRAL-CLUSTERING

Popular Questions

Popular Tags

Trending Questions