Hierarchical LDA eats up all available memory and never finishes

Question

Hierarchical LDA eats up all available memory and never finishes

427 views Asked by wojtuch At 19 December 2016 at 16:36

I am waiting for my membership on the mailing list to be confirmed, so I thought I would ask it here to maybe speed up the things a little bit.

I am writing my master's thesis on topic modeling and use Mallet implementations of LDA and HLDA.

I work on a corpus of over 4m documents. While LDA (ParallelTopicModel) handles the dataset decently, and I don't encounter any issues with that, HLDA is unable to go farther then let's say 5-6 iterations before filling up all the available memory (I even ran the program with 90g of RAM). On smaller datasets (10-20k documents) it works.

That's how I train the model:

HierarchicalLDA hierarchicalLDAModel = new HierarchicalLDA();
hierarchicalLDAModel.initialize(trainInstances, testInstances, numLevels, new Randoms());
hierarchicalLDAModel.estimate(numIterations);

I'd gladly provide any other information you might need for troubleshooting, just comment and let me know.

Thank you very much in advance!

Original Q&A

There are 1 answers

**David Mimno** · Answer 1 · 2016-12-20T14:33:25+00:00

hLDA is a non-parametric model, which means that the number of parameters expands with the data size. There's currently no way to apply a maximum number of topics. You can most effectively limit the number of topics by increasing the topic-word smoothing parameter eta (NOT the CRP parameters). If this parameter is small, the model will prefer to create a new topic rather than add a low-probability word to an existing topic.

TechQA.

Hierarchical LDA eats up all available memory and never finishes

There are 1 answers

Related Questions in TOPIC-MODELING

Related Questions in MALLET

Popular Questions

Popular Tags

Trending Questions