I'm very new in the ANTLR world and I'm trying to figure out how can I use this parsing tool to interpret a set of "noisy" string. What I would like to achieve is the following.
let's take for example this phrase : It's 10PM and the Lazy CAT is currently SLEEPING heavily on the SOFA in front of the TV
What I would like to extract is CAT
, SLEEPING
and SOFA
and have a grammar that match easily the following pattern : SUBJECT - VERB - INDIRECT OBJECT... where I could define
VERB : 'SLEEPING' | 'WALKING';
SUBJECT : 'CAT'|'DOG'|'BIRD';
INDIRECT_OBJECT : 'CAR'| 'SOFA';
etc.. I don't want to ends up with a permanent "NoViableException" as I can't describe all the possibilities around the language structure. I just want to tear apart useless words and just keep the one that are interesting.
It's more like if I had a tokeniser and asked the parser "Ok, read the stream until you find a SUBJECT, then ignore the rest until you find a VERB, etc.."
I need to extract an organized structure in an un-organized set... For example, I would like to be able to interpret (I'm not judging the pertinence of this utterly basic and incorrect view of 'english grammar')
SUBJECT - VERB - INDIRECT OBJECT
INDIRECT OBJECT - SUBJECT - VERB
so I will parse sentences like
It's 10PM and the Lazy CAT is currently SLEEPING heavily on the SOFA in front of the TV
or
It's 10PM and, on the SOFA in front of the TV, the Lazy CAT is currently SLEEPING heavily
You could create only a couple of lexer rules (the ones you posted, for example), and as a last lexer rule, you could match any character and
skip()
it:The order is important here: the lexer tries to match tokens from top to bottom, so if it can't match any of the tokens
VERB
,SUBJECT
orINDIRECT_OBJECT
, it "falls through" to theANY
rule and skips this token. You can then use these parser rules to filter your input stream:which will parse the input text:
as follows: