So I have a general bow corpus that I have created that yields documents per the format that gensim
requires (see here.)
However those documents have a lot of words that are used extremely often. So I wanted to use a tfidf to balance that out.
So I do something like
tfidf_model = TfidfModel(corpus)
new_corpus = tfidf_model[corpus]
Now I want to train my LDA
lda = LdaModel(corpus=new_corpus, num_topics=16)
And it trains and converges fine...great. Now I have a new unseen document that I want to project onto my lda topics. Do I always need to project this new doc with the tfidf_model
first? i.e.
transformed_doc = tfidf_model[unseen_doc]
projections = lda[transformed_doc]
Or can gensim
take the original and know to apply the tfidf
first then project onto the lda
.
projections = lda[unseen_doc]
The gensim
docs are a little unclear on whether or not the model knows any other previous transformations were applied to a corpus.