OpenFST - creating FST's from list of words

Question

OpenFST - creating FST's from list of words

469 views Asked by granduser At 12 February 2021 at 22:23

I'm reading the top example: http://www.openfst.org/twiki/bin/view/FST/FstExamples about tokenization.

In the example, they create three fsts: Mars.fst, Martian.fst, and man.fst, and manually run some fst commands to merge them into one big transducer. They get the word "Mars", "Martian", and "man" from wotw.syms, which has 7102 words.

My question is, is there a smart way to create a word.fst for all 7102 words, so that all 7102 words can be made into one big automata, or does it have to be done manually, like they did for the three word Martian, Mars, and man?

Original Q&A

There are 1 answers

**Slyne D** · Accepted Answer · 2021-12-05T12:07:00+00:00

They gave a script: https://www.openfst.org/twiki/pub/FST/FstExamples/makelex.py.txt We may simply:

cat wotw.syms | python2 makelex.py > lexicons_text.fst
fstcompile --isymbols=ascii.syms --osymbols=wotw.syms lexicon_text.fst lexicon.fst
fstrmepsilon lexicon.fst | fstdeterminize | fstminimize >lexicon_opt.fst

TechQA.

OpenFST - creating FST's from list of words

There are 1 answers

Related Questions in OPENFST

Related Questions in FST

Popular Questions

Popular Tags

Trending Questions