Following a very interesing discussion with Bart Kiers on parsing a noisy datastream with ANTLR, I'm ending up with another problem...
The aim is still the same : only extracting useful information with the following grammar,
VERB : 'SLEEPING' | 'WALKING';
SUBJECT : 'CAT'|'DOG'|'BIRD';
INDIRECT_OBJECT : 'CAR'| 'SOFA';
ANY : . {skip();};
parse
: sentenceParts+ EOF
;
sentenceParts
: SUBJECT VERB INDIRECT_OBJECT
;
a sentence like it's 10PM and the Lazy CAT is currently SLEEPING heavily on the SOFA in front of the TV.
will produce the following
This is perfect and it's doing exactly what I want.. from a big sentence, I'm extracting only the words that had a sense for me.... But the, I founded the following error. If somewhere in the text I'm introducing a word that begin exactly like a token, I'm ending up with a MismathedTokenException
or a noViableException
it's 10PM and the Lazy CAT is currently SLEEPING heavily, with a DOGGY bag, on the SOFA in front of the TV.
produce an error :
DOGGY
is interpreted as the beginning for DOG
which is also a part of the TOKEN SUBJECT
and the lexer is lost... How could I avoid this without defining DOGGY
as a special token... I would have like the parser to understand DOGGY
as a word in itself.
Well, it seems that adding this
ANY2 :'A'..'Z'+ {skip();};
solves my problem !