Mallet topic modelling

Question

Mallet topic modelling

2.4k views Asked by fayaz At 02 March 2011 at 13:48

I have been using mallet for inferring topics for a text file containing 100,000 lines(around 34 MB in mallet format). But now i need to run it for on a file containing a million lines(around 180MB) and I am getting an java.lang.outofmemory exception . Is there a way of splitting the file into smaller ones and build a model for the data present in all the files combined?? thanks in advance

Original Q&A

There are 5 answers

**yura** · Answer 1 · 2011-03-02T19:48:16+00:00

yura On 02 March 2011 at 19:48

I'm not sure about scalability of Mallet to big data, but project http://dragon.ischool.drexel.edu/ can store its data in disk backed persistence therefore can scale to unlimited corpus sizes(with low performance of course)

**Turnsole** · Answer 2 · 2011-03-02T20:17:47+00:00

Turnsole On 02 March 2011 at 20:17

The model is still going to be pretty much huge, even if it read it from multiple files. Have you tried increasing the heap size of your java vm?

**Leo5188** · Answer 3 · 2011-03-06T14:48:00+00:00

Leo5188 On 06 March 2011 at 14:48

Given the current PC's memory size, it should be easy to use a heap as large as 2GB. You should try the single-machine solution before considering using a cluster.

**Kiran M** · Answer 4 · 2012-01-09T12:07:10+00:00

Kiran M On 09 January 2012 at 12:07

java.lang.outofmemory exception occurs mainly because of insufficient heap space. You can use -Xms and -Xmx to set heap space so that it will not come again.

**metdos** · Answer 5 · 2012-11-04T21:36:38+00:00

metdos On 04 November 2012 at 21:36

In bin/mallet.bat increase value for this line:

set MALLET_MEMORY=1G

TechQA.

Mallet topic modelling

There are 5 answers

Related Questions in JAVA

Related Questions in NLP

Related Questions in MACHINE-LEARNING

Related Questions in MALLET

Popular Questions

Trending Questions