Fitting LDA to corpus in LDA-C format in gensim

565 views Asked by At

I'm trying to fit an LDA to a corpus in LDA-C format. I've got it working for a HDP model but I can't seem to make it work for LDA in gensim. I'm looking to get the topic probability vector for each document as well as the probability distribution over words for each topic.

Here is the HDP model which works fine

.dat file has the corpus in LDA-C format and .vocab file has unique words

corpus = gensim.corpora.belicorpus.BeliCorpus('ap.dat','ap.vocab')  
d = gensim.corpora.Dictionary()
d.token2id = dict(enumerate(l[:-1] for l in open('ap.vocab')))
hdp = gensim.models.HdpModel(corpus,d.token2id)
alpha, beta = hdp.hdp_to_lda()

# save topic prior
numpy.savetxt(corpus_name+'.alpha',alpha)

# save word distribution for each topic
numpy.savetxt(corpus_name+'.beta',beta)

# save topic distribution for each document in market matrix format
doc_hdp = hdp[corpus]
gensim.corpora.MmCorpus.save_corpus(corpus_name+'.mm',doc_hdp)

Here is the LDA implementation, I get the proper vectors but I can't seem to find a function that will give me the priors or word distribution/topic:

corpus=gensim.corpora.bleicorpus.BleiCorpus('ap.dat','ap.vocab') 
d = gensim.corpora.Dictionary()
d.token2id = dict(enumerate(l[:-1] for l in open('ap.vocab')))
lda = gensim.models.LdaModel(corpus,num_topics=10,id2word=d.token2id)
0

There are 0 answers