I'm interested in using lex
to tokenize my input string, but I do not want it to be possible to "fail". Instead, I want to have some type of DEFAULT
or TEXT
token, which would contain all the non-matching characters between recognized tokens.
Anyone have experience with something like this?
To expand on @Chris Dodd's answer, the final rule in any lex script should be:
and don't write any single-character rules like
"+" return PLUS;
. Just use the special characters you recognize directly in the grammar, e.g.term: term '+' factor;
.This practice: