OpenFST - creating FST's from list of words

428 views Asked by At

I'm reading the top example: http://www.openfst.org/twiki/bin/view/FST/FstExamples about tokenization.

In the example, they create three fsts: Mars.fst, Martian.fst, and man.fst, and manually run some fst commands to merge them into one big transducer. They get the word "Mars", "Martian", and "man" from wotw.syms, which has 7102 words.

My question is, is there a smart way to create a word.fst for all 7102 words, so that all 7102 words can be made into one big automata, or does it have to be done manually, like they did for the three word Martian, Mars, and man?

1

There are 1 answers

0
Slyne D On BEST ANSWER

They gave a script: https://www.openfst.org/twiki/pub/FST/FstExamples/makelex.py.txt We may simply:

cat wotw.syms | python2 makelex.py > lexicons_text.fst
fstcompile --isymbols=ascii.syms --osymbols=wotw.syms lexicon_text.fst lexicon.fst
fstrmepsilon lexicon.fst | fstdeterminize | fstminimize >lexicon_opt.fst