Breaking lexical elements into pieces

114 views Asked by At

My grammar file test.ebnf looks like,

start = identifier ;

identifier =
  /[a-z]*/ rest;

rest = /[0-9]*/ ;

When I run this grammar in the input "test1234", I want it to yield "test1234" as a single lexeme, but instead the AST looks like,

AST:
['test', '1234']

I've tried running with the nameguard feature set to false with no luck. How can I get this behaviour without writing the rule like identifier = /[a-z]*[0-9]*/?

1

There are 1 answers

8
Apalala On

Grako will always return a list with one object per element on a rule's right hand side, except when there's only one element. Even when naming elements, multiple matches with the same name will return a list. Just concatenating the elements is not reasonable because their ASTs may be objects as complex as the project requires.

In your case, you can use a semantic action to join the identifier parts:

def identifier(self, ast):
    return ''.join(ast)

Or redefine the identifier rule to have a single element:

identifier
    =
    /[a-z]+[0-9]*|[a-z]*[0-9]+/
    ;

(Note the changes to the regular expression so it never matches an empty string).