Trying "simple" parsing w/ antlr4 to reformat / pretty-print

550 views Asked by At

Scenario:

  • I receive a mystery language with obviously very deep syntax
    • Enormous 10k mess, on a single line, represents 1 statement in mystery language
    • Suppose I don't initially care about the deep syntax
  • All I want to do is reformat it based, on nested parenthesis

My simplified language Rules:

  • Most of the text I don't care about, I just want to preserve it as-is
  • Opening paren means open a new level; matching closing paren means done with that level
  • Can have multiple parenthesis at same level separated by commas, though I don't really care about that
  • Can have multiple adjacent opening and closing parenthesis
  • There may or may not be text before the first paren, and after the last paren

I've tried a bunch of different grammars, starting with the ArrayInit example in the Antlr 4 Reference book.

This is one failed attempt:

grammar NestedParens ;
init: STR* ( '(' value (value)* ')' )* STR* ;
value: init
     | STR
     ;
STR: [^()]+ ;

The error ANTLR gives:

"error(153): NestedParens.g4:5:0: rule init contains a closure with at least one alternative that can match an empty string"

(line number might be off from what I posted)

A few thoughts:

  • I think the valid zero-length strings are an issue, but not sure how to factor around them
  • Maybe Antlr, which I believe is always top-down, isn't the right tool for this?
  • Maybe there's an easier tool that lets you only specify that you care about matching parens, braces, brackets, etc?
1

There are 1 answers

0
Onur On

Try this grammar:

grammar NestedParens ;
init: STR* ( '(' value (value)* ')' ) STR* ;
value: init
     | STR
     ;
STR: [^()]+ ;

The difference is one missing * which made the empty string matching init in your version (the error that the antlr compiler complained about) no longer match.

This would also work:

grammar NestedParens ;
init: STR* ( '(' value (value)* ')' )+ STR* ;
value: init
     | STR
     ;
STR: [^()]+ ;