I am starting to learn Javacc and trying to figure out this problem but I can't seem to fully understand if I am doing this right or not.
So what I am doing is making a parser for a custom language and generating Java parser source code using Javacc.
I think I am doing this right but have a lot of doubt on if this is correct or not.
Here is the .jj file I have so far.
options {
JAVA_UNICODE_ESCAPE = true;
STATIC = false;
}
PARSER_BEGIN(Custom_Lexer)
public class Custom_Lexer {}
PARSER_END(Custom_Lexer)
void Custom_Lexer_Program() :
{}
{
<BEGIN> <CLPL>
( Custom_Lexer_Statement() )*
<END>
<EOF>
}
void Custom_Lexer_Statement():
{}
{
STATEMENT()
<SEMICOLON>
}
void STATEMENT():
{}
{
LOOKAHEAD(2) OUTPUT_STATEMENT() |
LOOKAHEAD(2) INPUT_STATEMENT() |
LOOKAHEAD(2) VARIABLE_DECLARATION() |
LOOKAHEAD(2) VARIABLE_ASSIGNMENT() |
LOOKAHEAD(2) IF_THEN_STATEMENT()
}
void OUTPUT_STATEMENT():
{}
{
<OUTPUT> <EQUALS> EXPRESSION()
}
void INPUT_STATEMENT():
{}
{
VARIABLE_DECLARATION()*
}
void VARIABLE_DECLARATION():
{}
{
<VARIABLE> (<EQUALS> <INT> | <BOOL> | <STRING>)?
}
void VARIABLE_ASSIGNMENT():
{}
{
<VARIABLE> (<EQUALS> EXPRESSION()
}
void IF_THEN_STATEMENT():
{}
{
<IF> EXPRESSION() <THEN> VARIABLE_ASSIGNMENT() [<ELSE> VARIABLE_ASSIGNMENT()]
}
//Will define these later after the above issues are fixed
void EXPRESSION():
{}
{
LOOKAHEAD(5) BINARY_EXPRESSION() |
LOOKAHEAD(5) IDENTIFIER_EXPRESSION() |
LOOKAHEAD(5) LITERAL_VALUE_EXPRESSION() |
LOOKAHEAD(5) PARENTHESIZED_EXPRESSION()
}
//Reserved words
TOKEN: { <CLPL: "CLPL" > }
TOKEN: { <BEGIN: "BEGIN" > }
TOKEN: { <END: "END" > }
TOKEN: { <OUTPUT: "OUTPUT" > }
TOKEN: { <INPUT: "INPUT" > }
TOKEN: { <IF: "IF" > }
TOKEN: { <THEN: "THEN" > }
TOKEN: { <INT: "int" > }
TOKEN: { <BOOL: "bool" > }
TOKEN: { <STRING: "string" > }
TOKEN: { <SEMICOLON: ";" > }
TOKEN: { <LEFT_PAREN: "(" > }
TOKEN: { <RIGHT_PAREN: ")" > }
TOKEN: { <PLUS: "+" > }
TOKEN: { <MINUS: "-" > }
TOKEN: { <MULTIPLY: "*" > }
TOKEN: { <DIVIDE: "/" > }
TOKEN: { <EQUALITY: "==" > }
TOKEN: { <EQUALS: "=" > }
TOKEN: { <GT: ">" > }
TOKEN: { <LT: "<" > }
TOKEN: { <BOOLEAN_LITERAL: "true" | "false" > }
TOKEN: { <INTEGER_LITERAL: (["0"-"9"])+ > }
TOKEN: { <STRING_LITERAL: "\"" (~["\"","\\","\n","\r"] | "\\" (["n","t","b","r","f","\\","\'","\""] | ["0"-"7"] (["0"-"7"])? | ["0"-"3"] ["0"-"7"] ["0"-"7"]))* "\""> }
TOKEN: { <IDENTIFIER: (["a"-"z"]|["A"-"Z"]|"_")+((["a"-"z","A"-"Z","0"-"9","_"])*)? > }
It's unfinished, but looks like a reasonable start. I'd suggest that you avoid all
LOOKAHEAD
specifications until you understand better what you are doing. Try left factoring so that all choices can be made with the default lookahead method.One problem I see is that the conflict between
VARIABLE_DECLARATION
andINPUT_STATEMENT
can't be resolved since anyVARIABLE_DECLARATION
is also anINPUT_STATEMENT
.