Scenario:
- I receive a mystery language with obviously very deep syntax
- Enormous 10k mess, on a single line, represents 1 statement in mystery language
- Suppose I don't initially care about the deep syntax
- All I want to do is reformat it based, on nested parenthesis
My simplified language Rules:
- Most of the text I don't care about, I just want to preserve it as-is
- Opening paren means open a new level; matching closing paren means done with that level
- Can have multiple parenthesis at same level separated by commas, though I don't really care about that
- Can have multiple adjacent opening and closing parenthesis
- There may or may not be text before the first paren, and after the last paren
I've tried a bunch of different grammars, starting with the ArrayInit example in the Antlr 4 Reference book.
This is one failed attempt:
grammar NestedParens ;
init: STR* ( '(' value (value)* ')' )* STR* ;
value: init
| STR
;
STR: [^()]+ ;
The error ANTLR gives:
"error(153): NestedParens.g4:5:0: rule init contains a closure with at least one alternative that can match an empty string"
(line number might be off from what I posted)
A few thoughts:
- I think the valid zero-length strings are an issue, but not sure how to factor around them
- Maybe Antlr, which I believe is always top-down, isn't the right tool for this?
- Maybe there's an easier tool that lets you only specify that you care about matching parens, braces, brackets, etc?
Try this grammar:
The difference is one missing
*
which made the empty string matchinginit
in your version (the error that the antlr compiler complained about) no longer match.This would also work: