I want to perform Singular Value Decomposition on a TF-IDF matrix. But the TF-IDF matrix gives me something like this (index of term,score):
[(1,0.2) , (2,0.3) , (6,0.1) ...]
[(3,0.2) , (5,0.3) , (10,0.1) ...]
So the code u,s,v = svd(corpus_tfidf)
will not work on it.
I want a TF-IDF matrix that only has scores, not terms indices.
I have calculated TF-IDF like this:
tfidf = models.TfidfModel(corpus)
corpus_tfidf=tfidf[corpus]
If you use gensim for tfidf generation, you can use matutils to convert your tfidf representation to dense numpy ndarray and vice versa.
where num_terms is a number of unique terms in your corpus. It can be calculated this way: