Using online LDA to predict on test data

Question

Using online LDA to predict on test data

869 views Asked by Vishnu At 07 November 2018 at 15:46

I am using online LDA to perform some topic modeling task. I am using the core code based on the paper Original Online LDA paper: Hoffman, Blei and Bach, "Online Learning for Latent Dirichlet Allocation." NIPS, 2010. and the code is available at : https://github.com/blei-lab/onlineldavb.

I am using a train set of ~167000 documents. The code generates lambda files as output which I use to generate the topics(https://github.com/wellecks/online_lda_python , printtopics.py).But I am not sure how I can use it to find topics on new test data ( similar to model.get_document_topics in gensim ). Please help to resolve my confusion.

Original Q&A

There are 2 answers

**Atendra Gautam** · Answer 1 · 2018-11-09T06:51:04+00:00

Atendra Gautam On 09 November 2018 at 06:51

Follow same data processing steps on test data i.e Tokenization etc and then use your training data vocab to transform test data into gensim corpus.

Once you have test corpus use LDA to find document- topic distribution. Hope this helps.

**Dan D.** · Answer 2 · 2018-11-09T11:04:01+00:00

In the code you already have there is enough to do this. What you have is the lambda (the word-topic matrix), what you want to compute is the gamma (the document-topic matrix).

All you need to do is call OnlineLDA.do_e_step on the documents, the results are the topic vectors. Performance might be improved by stripping out the sstats from it as those are only needed to update the lambda. The result would be a function that only infers the topic vectors for the model.

You don't need to update the model as you aren't training it which is what update_lambda does after calling do_e_step.

TechQA.

Using online LDA to predict on test data

There are 2 answers

Related Questions in PYTHON

Related Questions in ALGORITHM

Related Questions in LDA

Related Questions in TOPIC-MODELING

Related Questions in DIRICHLET

Popular Questions

Popular Tags

Trending Questions