JavaCC Syntax issue with understand ability

239 views Asked by At

I am starting to learn Javacc and trying to figure out this problem but I can't seem to fully understand if I am doing this right or not.

So what I am doing is making a parser for a custom language and generating Java parser source code using Javacc.

I think I am doing this right but have a lot of doubt on if this is correct or not.

Here is the .jj file I have so far.

options {
  JAVA_UNICODE_ESCAPE = true;
  STATIC = false;
}

PARSER_BEGIN(Custom_Lexer)
  public class Custom_Lexer {}
PARSER_END(Custom_Lexer)


void Custom_Lexer_Program() :
{}
{
  <BEGIN> <CLPL>
  ( Custom_Lexer_Statement() )*
  <END>
  <EOF>
}

void Custom_Lexer_Statement():
{}
{
    STATEMENT()
    <SEMICOLON>
}

void STATEMENT():
{}
{
    LOOKAHEAD(2) OUTPUT_STATEMENT()     |
    LOOKAHEAD(2) INPUT_STATEMENT()      |
    LOOKAHEAD(2) VARIABLE_DECLARATION() | 
    LOOKAHEAD(2) VARIABLE_ASSIGNMENT()  |
    LOOKAHEAD(2) IF_THEN_STATEMENT()
}

void OUTPUT_STATEMENT():
{}
{
    <OUTPUT> <EQUALS> EXPRESSION()
}

void INPUT_STATEMENT():
{}
{
    VARIABLE_DECLARATION()*
}

void VARIABLE_DECLARATION():
{}
{
    <VARIABLE> (<EQUALS> <INT> | <BOOL> | <STRING>)?
}

void VARIABLE_ASSIGNMENT():
{}
{
    <VARIABLE> (<EQUALS> EXPRESSION()
}

void IF_THEN_STATEMENT():
{}
{
    <IF> EXPRESSION() <THEN> VARIABLE_ASSIGNMENT() [<ELSE> VARIABLE_ASSIGNMENT()]
}
//Will define these later after the above issues are fixed
void EXPRESSION():
{}
{
    LOOKAHEAD(5) BINARY_EXPRESSION()        |
    LOOKAHEAD(5) IDENTIFIER_EXPRESSION()    |
    LOOKAHEAD(5) LITERAL_VALUE_EXPRESSION() |
    LOOKAHEAD(5) PARENTHESIZED_EXPRESSION()
}


//Reserved words
TOKEN: { <CLPL:   "CLPL"   > }
TOKEN: { <BEGIN:   "BEGIN"   > }
TOKEN: { <END:     "END"     > }
TOKEN: { <OUTPUT:  "OUTPUT"  > }
TOKEN: { <INPUT:   "INPUT"   > }
TOKEN: { <IF:      "IF"      > }
TOKEN: { <THEN:    "THEN"    > }


TOKEN: { <INT:    "int"      > }
TOKEN: { <BOOL:   "bool"     > }
TOKEN: { <STRING: "string"   > }


TOKEN: { <SEMICOLON:     ";" > }
TOKEN: { <LEFT_PAREN:    "(" > }
TOKEN: { <RIGHT_PAREN:   ")" > }
TOKEN: { <PLUS:          "+" > }
TOKEN: { <MINUS:         "-" > }
TOKEN: { <MULTIPLY:      "*" > }
TOKEN: { <DIVIDE:        "/" > }
TOKEN: { <EQUALITY:     "==" > }
TOKEN: { <EQUALS:        "=" > }
TOKEN: { <GT:            ">" > }
TOKEN: { <LT:            "<" > }


TOKEN: { <BOOLEAN_LITERAL: "true" | "false" > }


TOKEN: { <INTEGER_LITERAL: (["0"-"9"])+ > }


TOKEN: { <STRING_LITERAL: "\"" (~["\"","\\","\n","\r"] | "\\" (["n","t","b","r","f","\\","\'","\""] | ["0"-"7"] (["0"-"7"])? | ["0"-"3"] ["0"-"7"] ["0"-"7"]))* "\""> }


TOKEN: { <IDENTIFIER: (["a"-"z"]|["A"-"Z"]|"_")+((["a"-"z","A"-"Z","0"-"9","_"])*)? > } 
1

There are 1 answers

0
Theodore Norvell On BEST ANSWER

It's unfinished, but looks like a reasonable start. I'd suggest that you avoid all LOOKAHEAD specifications until you understand better what you are doing. Try left factoring so that all choices can be made with the default lookahead method.

One problem I see is that the conflict between VARIABLE_DECLARATION and INPUT_STATEMENT can't be resolved since any VARIABLE_DECLARATION is also an INPUT_STATEMENT.