I am using the MGIZA++ for aligning word from the bitexts from United Nations Parallel Corpus。
Before training the alignment model using MGIZA++, I need to use the mkcls
script to make classes that is necessary for Hidden Markov Model algorithm as such:
mkcls -c50 -n10 -ptest.en -Vtest.en.vcb.classes
i'm trying it on corpus with 1,000,000 lines, but is takes a long time and still can't get result (when I try a small dataset, it works).
Is there a multi-threaded or parallel toolkit to do mkcls?