Calculate SVD on a TF-IDF matrix

2.9k views Asked by At

I want to perform Singular Value Decomposition on a TF-IDF matrix. But the TF-IDF matrix gives me something like this (index of term,score):

[(1,0.2) , (2,0.3) , (6,0.1) ...]
[(3,0.2) , (5,0.3) , (10,0.1) ...]

So the code u,s,v = svd(corpus_tfidf) will not work on it. I want a TF-IDF matrix that only has scores, not terms indices.

I have calculated TF-IDF like this:

tfidf = models.TfidfModel(corpus)
corpus_tfidf=tfidf[corpus]
1

There are 1 answers

0
Eduard Ilyasov On BEST ANSWER

If you use gensim for tfidf generation, you can use matutils to convert your tfidf representation to dense numpy ndarray and vice versa.

from gensim import matutils
tfidf_dense = matutils.corpus2dense(corpus_tfidf, num_terms).T

where num_terms is a number of unique terms in your corpus. It can be calculated this way:

num_terms = len(corpus_tfidf.obj.idfs)