Fitting LDA to corpus in LDA-C format in gensim

638 views Asked by jdv12 At 12 June 2015 at 13:15

I'm trying to fit an LDA to a corpus in LDA-C format. I've got it working for a HDP model but I can't seem to make it work for LDA in gensim. I'm looking to get the topic probability vector for each document as well as the probability distribution over words for each topic.

Here is the HDP model which works fine

.dat file has the corpus in LDA-C format and .vocab file has unique words

corpus = gensim.corpora.belicorpus.BeliCorpus('ap.dat','ap.vocab')  
d = gensim.corpora.Dictionary()
d.token2id = dict(enumerate(l[:-1] for l in open('ap.vocab')))
hdp = gensim.models.HdpModel(corpus,d.token2id)
alpha, beta = hdp.hdp_to_lda()

# save topic prior
numpy.savetxt(corpus_name+'.alpha',alpha)

# save word distribution for each topic
numpy.savetxt(corpus_name+'.beta',beta)

# save topic distribution for each document in market matrix format
doc_hdp = hdp[corpus]
gensim.corpora.MmCorpus.save_corpus(corpus_name+'.mm',doc_hdp)

Here is the LDA implementation, I get the proper vectors but I can't seem to find a function that will give me the priors or word distribution/topic:

corpus=gensim.corpora.bleicorpus.BleiCorpus('ap.dat','ap.vocab') 
d = gensim.corpora.Dictionary()
d.token2id = dict(enumerate(l[:-1] for l in open('ap.vocab')))
lda = gensim.models.LdaModel(corpus,num_topics=10,id2word=d.token2id)

Original Q&A

TechQA.

Fitting LDA to corpus in LDA-C format in gensim

There are 0 answers

Related Questions in LDA

Related Questions in TOPIC-MODELING

Related Questions in GENSIM

Popular Questions

Trending Questions