LDA generated topics

128 views Asked by At

so I am relatively new working with gensim and LDA, started about two weeks ago and I am having trouble trusting these results. The following are the topics produced by using 11 1-paragraph documents.

topic #0 (0.500): 0.059*island + 0.059*world + 0.057*computers + 0.056*presidential + 0.053*post + 0.047*posts + 0.046*tijuana + 0.045*vice + 0.045*tweets + 0.045*president

2015-06-04 16:22:07,891 : INFO : topic #1 (0.500): 0.093*computers + 0.064*world + 0.060*posts + 0.053*eurozone + 0.052*months + 0.049*tijuana + 0.048*island + 0.046*raise + 0.044*rates + 0.042*year

These topics just don't quite seem right. In fact they seem almost non-sensical. How exactly should I read these results? Also, is it normal that the topic distributions are exactly the same for both topics?

1

There are 1 answers

0
Arnab Bhadury On

So, you only have 11 documents, and are trying to get 2 topics out of them? Maybe it could be the case of not having enough data but try iterating more.

BTW, is the negative log-likelihood or the perplexity going down after each iteration?

Just looking at the results, I think if you iterate more, you will get the right result, because the algorithm has correctly put semantically close things together in one topic already. (post, posts, tweets; months, years)