how to handle Unicode dot in table driven FSM?

97 views Asked by akonsu At 24 June 2015 at 22:05

Tools like "lex" and "flex", as far as I know, handle byte input only. ASCII that is. The FSM state transition tables generated by these tools are not big as the result, because there are only 256 possible characters in the alphabet.

I am trying to figure out how to implement a . (any character), or a [^...] range in a regular expression evaluator if my alphabet is Unicode. Say, UTF8. Are there any known techniques as to how to make the transition tables manageable in this case? Making them keep all possible characters is of course unreasonable.

Any ideas?

Original Q&A

TechQA.

how to handle Unicode dot in table driven FSM?

There are 0 answers

Related Questions in REGEX

Related Questions in DFA

Related Questions in FSM

Related Questions in NFA

Popular Questions

Popular Tags

Trending Questions