Do I need to transform unseen documents before projecting them onto model topics?

225 views Asked by At

So I have a general bow corpus that I have created that yields documents per the format that gensim requires (see here.)

However those documents have a lot of words that are used extremely often. So I wanted to use a tfidf to balance that out.

So I do something like

tfidf_model = TfidfModel(corpus)
new_corpus = tfidf_model[corpus]

Now I want to train my LDA

lda = LdaModel(corpus=new_corpus, num_topics=16)

And it trains and converges fine...great. Now I have a new unseen document that I want to project onto my lda topics. Do I always need to project this new doc with the tfidf_model first? i.e.

transformed_doc = tfidf_model[unseen_doc]
projections = lda[transformed_doc]

Or can gensim take the original and know to apply the tfidf first then project onto the lda.

projections = lda[unseen_doc]

The gensim docs are a little unclear on whether or not the model knows any other previous transformations were applied to a corpus.


There are 0 answers