ANTLR on a noisy data stream Part 2

Question

ANTLR on a noisy data stream Part 2

125 views Asked by BlackLabrador At 01 December 2010 at 13:51

Following a very interesing discussion with Bart Kiers on parsing a noisy datastream with ANTLR, I'm ending up with another problem...

The aim is still the same : only extracting useful information with the following grammar,

VERB            : 'SLEEPING' | 'WALKING';
SUBJECT         : 'CAT'|'DOG'|'BIRD'; 
INDIRECT_OBJECT : 'CAR'| 'SOFA';  
ANY             : . {skip();};

parse 
  :  sentenceParts+ EOF 
  ;

sentenceParts  
  :  SUBJECT VERB INDIRECT_OBJECT  
  ;

a sentence like it's 10PM and the Lazy CAT is currently SLEEPING heavily on the SOFA in front of the TV. will produce the following

alt text

This is perfect and it's doing exactly what I want.. from a big sentence, I'm extracting only the words that had a sense for me.... But the, I founded the following error. If somewhere in the text I'm introducing a word that begin exactly like a token, I'm ending up with a MismathedTokenException or a noViableException


    it's 10PM and the Lazy CAT is currently SLEEPING heavily, 
    with a DOGGY bag, on the SOFA in front of the TV.

produce an error :

alt text

DOGGY is interpreted as the beginning for DOG which is also a part of the TOKEN SUBJECT and the lexer is lost... How could I avoid this without defining DOGGY as a special token... I would have like the parser to understand DOGGY as a word in itself.

Original Q&A

There are 1 answers

**BlackLabrador** · Accepted Answer · 2010-12-01T16:35:40+00:00

BlackLabrador On 01 December 2010 at 16:35 BEST ANSWER

Well, it seems that adding this ANY2 :'A'..'Z'+ {skip();}; solves my problem !

TechQA.

ANTLR on a noisy data stream Part 2

There are 1 answers

Related Questions in ANTLR

Related Questions in GRAMMAR

Related Questions in NOISE-WORDS

Popular Questions

Popular Tags

Trending Questions