Determine the document ID on Mahout LDA Output

Question

Determine the document ID on Mahout LDA Output

725 views Asked by Pedro Pi At 25 February 2011 at 07:47

I've successfully ran mahout lda, and displayed the ouput using the command mahout ldatopics.

For example my topics are science and sports. then the output will be like: topic 0 basketball, play, baseball topic 1 research, study, philosophy

My question now is how can I, identify the the individual article's group or cluster. Is there an id number or some sort of tracking, so that for every new article that I add it will be grouped or added to a specific cluster/topic.

If I already have the cluster, what's the next step?

Thanks

Original Q&A

There are 1 answers

**Kevin** · Answer 1 · 2011-03-03T17:15:09+00:00

I've been looking through the source code and I can't find any mention of a theta matrix for calculating the probability of topics given a document and since there's no input for an Alpha value to estimate the topics per document and the LDAState class has a logProbWordGivenTopic(int, int) method but nothing like getProbTopicGivenDocument() I can only assume the mahout implementation of LDA doesn't deal with discovering the topic distribution for specific documents. I'd love to be wrong though if someone else knows better.

TechQA.

Determine the document ID on Mahout LDA Output

There are 1 answers

Related Questions in APACHE

Related Questions in MACHINE-LEARNING

Related Questions in CLUSTER-ANALYSIS

Related Questions in MAHOUT

Related Questions in DIRICHLET

Popular Questions

Popular Tags

Trending Questions