ANTLR Mismatched input 'echo 1' expecting {'<', '>', ''', '`', '"', WS, UNQUOTED}

35 views Asked by At

The grammar I wrote is meant to parse Shell commands.

grammar cmdline;

command     : call | pipe | command ';' command;
pipe        : call '|' call
            | pipe '|' call ;

call : WS? (redirection WS?)* argument (WS? atom)* WS? ;
atom : redirection | argument ;
redirection : '<' WS? argument
            | '>' WS? argument ;
argument : (quoted | UNQUOTED)+ ;

quoted : singleQuoted
       | doubleQuoted
       | backQuoted ;
singleQuoted: '\'' NONNEWLINEANDNONSINGLEQUOTE* '\'' ;
backQuoted  : '`' NONNEWLINEANDNONBACKQUOTE* '`' ;
doubleQuoted: '"' (backQuoted | DOUBLEQUOTECONTENT)* '"' ;


// Lexer rules
WS : [ \t\r\n]+ ;
UNQUOTED : (~[ '"`\r\n|;><])+ ;
NONNEWLINEANDNONSINGLEQUOTE : (~[\n\r'])+ ;
NONNEWLINEANDNONBACKQUOTE : (~[\n\r`])+ ;
DOUBLEQUOTECONTENT : (~[\n\r"`])+ ;

I tried to parse

echo 1

and it gave the error

line 1:0 mismatched input 'echo 1' expecting {'<', '>', ''', '`', '"', WS, UNQUOTED}.

What I do not understand is how the error occurs as I think "echo" will match "argument" and "1" will match "atom". Many thanks.

I tried to change the order of lexers but it did not help.

1

There are 1 answers

0
Bart Kiers On

ANTLR's lexer works in a very predictable way:

  1. for each lexer rule, try to match as many characters as possible
  2. in case there are 2 (or more) lexer rules that match the same characters, pick the one that is defined first

Because of point 2, the entire input echo 1 is tokenised as a NONNEWLINEANDNONSINGLEQUOTE.

Note that the lexer is not directed by the parser: it just follows the two steps I mentioned earlier.