The grammar I wrote is meant to parse Shell commands.
grammar cmdline;
command : call | pipe | command ';' command;
pipe : call '|' call
| pipe '|' call ;
call : WS? (redirection WS?)* argument (WS? atom)* WS? ;
atom : redirection | argument ;
redirection : '<' WS? argument
| '>' WS? argument ;
argument : (quoted | UNQUOTED)+ ;
quoted : singleQuoted
| doubleQuoted
| backQuoted ;
singleQuoted: '\'' NONNEWLINEANDNONSINGLEQUOTE* '\'' ;
backQuoted : '`' NONNEWLINEANDNONBACKQUOTE* '`' ;
doubleQuoted: '"' (backQuoted | DOUBLEQUOTECONTENT)* '"' ;
// Lexer rules
WS : [ \t\r\n]+ ;
UNQUOTED : (~[ '"`\r\n|;><])+ ;
NONNEWLINEANDNONSINGLEQUOTE : (~[\n\r'])+ ;
NONNEWLINEANDNONBACKQUOTE : (~[\n\r`])+ ;
DOUBLEQUOTECONTENT : (~[\n\r"`])+ ;
I tried to parse
echo 1
and it gave the error
line 1:0 mismatched input 'echo 1' expecting {'<', '>', ''', '`', '"', WS, UNQUOTED}.
What I do not understand is how the error occurs as I think "echo" will match "argument" and "1" will match "atom". Many thanks.
I tried to change the order of lexers but it did not help.
ANTLR's lexer works in a very predictable way:
Because of point 2, the entire input
echo 1is tokenised as aNONNEWLINEANDNONSINGLEQUOTE.Note that the lexer is not directed by the parser: it just follows the two steps I mentioned earlier.