Conflict between lexer rules in ANTLR4 for Fortran grammar

74 views Asked by At

I'm developing a Fortran parser using ANTLR4, adhering to the ISO Fortran Standard 2018 specifications. While implementing lexer rules, I encountered a conflict between the NAME and LETTERSPEC rules. Specifically, when the input consists of just a letter, it is always tokenized as NAME and never as LETTERSPEC. Here's a partial simplified version of the grammer:

lexer grammar FortrantTestLex;

// Lexer rules
WS: [ \t\r\n]+ -> skip;

// R603 name -> letter [alphanumeric-character]...
NAME: LETTER (ALPHANUMERICCHARACTER)*;

// R865 letter-spec -> letter [- letter]
LETTERSPEC: LETTER (MINUS LETTER)?;

MINUS: '-';

// R601 alphanumeric-character -> letter | digit | underscore
ALPHANUMERICCHARACTER: LETTER | DIGIT | UNDERSCORE;

// R0002 Letter ->
//         A | B | C | D | E | F | G | H | I | J | K | L | M |
//         N | O | P | Q | R | S | T | U | V | W | X | Y | Z
LETTER: 'A'..'Z' | 'a'..'z';

// R0001 Digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
DIGIT: '0'..'9';

// R602 UNDERSCORE -> _
UNDERSCORE: '_';
grammer FortranTest;
import FortranTestLex;

// Parser rules

programName: NAME;

// R1402 program-stmt -> PROGRAM program-name
programStmt: PROGRAM programName;

letterSpecList: LETTERSPEC (COMMA LETTERSPEC)*;

// R864 implicit-spec -> declaration-type-spec ( letter-spec-list )
implicitSpec: declarationTypeSpec LPAREN letterSpecList RPAREN;

implicitSpecList: implicitSpec (COMMA implicitSpec)*;

// R863 implicit-stmt -> IMPLICIT implicit-spec-list | IMPLICIT NONE [( [implicit-name-spec-list] )]
implicitStmt:
    IMPLICIT implicitSpecList
    | IMPLICIT NONE ( LPAREN implicitNameSpecList? RPAREN )?;

// R1403 end-program-stmt -> END [PROGRAM [program-name]]
endProgramStmt: END (PROGRAM programName?)?;

// R1401 main-program ->
//         [program-stmt] [specification-part] [execution-part]
//         [internal-subprogram-part] end-program-stmt
mainProgram: programStmt? endProgramStmt;

//R502 program-unit -> main-program | external-subprogram | module | submodule | block-data
programUnit: mainProgram;

//R501 program -> program-unit [program-unit]...    
program: programUnit (programUnit)*;  

In this case, the tokenization always results in NAME even though it could also be a valid LETTERSPEC. How can I resolve this conflict in my lexer rules to ensure correct tokenization?

I've tried adjusting the order of the lexer rules and refining the patterns, but I haven't been able to achieve the desired behavior. Any insights or suggestions on how to properly handle this conflict would be greatly appreciated. Thank you!

0

There are 0 answers