I am trying to calculate the perplexity score in Spyder for different numbers of topics in order to find the best model parameters with gensim.
However, the perplexity score is not decreasing as it is supposed to [1]. Besides, there seem to be more persons experiencing this exact issue but no solution is available as far as I know.
Does anyone have any idea on how to solve the issue?
Code:
X_train, X_test = train_test_split(corpus, train_size=0.9, test_size=0.1, random_state=1)
topic_range = [10, 20, 25, 30, 40, 50, 60, 70, 75, 90, 100, 150, 200]
def lda_function(X_train, X_test, dictionary, nr_topics):
ldamodel2 = gensim.models.LdaModel(X_train,
id2word=dictionary,
num_topics=nr_topics,
alpha='auto',
eta=0.01,
passes=10
iterations=500,
random_state=42)
return 2**(-1*ldamodel2.log_perplexity(X_test))
log_perplecs = [lda_function(X_train, X_test, dictionary, nr_topics=topic) for topic in topic_range]
print("\n",log_perplecs)
fig1, ax1 = plt.subplots(figsize=(7,5))
ax1.scatter(x=topic_range, y=log_perplecs)
fig1.tight_layout()
fig1.savefig(output_directory + "Optimal Number of Topics (Perplexity Score).pdf", bbox_inches = 'tight')```
[1]: https://i.stack.imgur.com/jFiF1.png