I am trying to build a model with 500 or 1000 topics on a 1M document dataset with Mallet LDA. After 60 iterations I am getting an ArrayIndexOutOfBoundsException
. The error message is as below:
<60> LL/token: -7.64386
overflow on type 8
java.lang.ArrayIndexOutOfBoundsException: 500
at cc.mallet.topics.WorkerRunnable.buildLocalTypeTopicCounts(WorkerRunnable.java:208)
at cc.mallet.topics.WorkerRunnable.run(WorkerRunnable.java:280)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
overflow on type 8
The command I am running is:
bin/mallet train-topics
--input data.mallet
--output-model lda.model
--inferencer-filename topic-inferencer-model.mallet
--output-topic-keys topic-keys.txt
--topic-word-weights-file topic-word-weights.txt
--word-topic-counts-file word-topic-counts-file.txt
--output-doc-topics doc-topics.txt
--num-topics 500
--num-threads 16
--num-iterations 1500
--use-symmetric-alpha FALSE
Any suggestion is much appreciated.