Could anyone recommend set of tools to perform standard NMF application onto sparse input data [ matrix of size 50kx50k ], thanks!
Non-negative matrix factorization of sparse input
2.5k views Asked by Kamil Czarnogorski At
1
Could anyone recommend set of tools to perform standard NMF application onto sparse input data [ matrix of size 50kx50k ], thanks!
scikit-learn has an implementation of NMF for sparse matrices. You will need the bleeding-edge version from GitHub, though, since all released versions (up to and including 0.14) had a scalability problem. A demo follows.
Load some data: the twenty newsgroups text corpus.
Now fit an NMF model with 10 components.
I tweaked the tolerance option to make this convergence in a few seconds. With the default tolerance, it takes quite a bit longer. The memory usage for this problem is around 360MB.
Disclaimer: I'm a contributor to this implementation, so this is not unbiased advice.