How to speedup mkcls step in mgiza++ or giza++, it taking up lots of time for word clustering?

263 views Asked by At

I am using the MGIZA++ for aligning word from the bitexts from United Nations Parallel Corpus。

Before training the alignment model using MGIZA++, I need to use the mkcls script to make classes that is necessary for Hidden Markov Model algorithm as such:

mkcls -c50 -n10 -ptest.en -Vtest.en.vcb.classes

i'm trying it on corpus with 1,000,000 lines, but is takes a long time and still can't get result (when I try a small dataset, it works).

Is there a multi-threaded or parallel toolkit to do mkcls?

0

There are 0 answers