Abstract problem description:
The way I see it, unparsing means to create a token stream from an AST, which when parsed again produces an equal AST.
So parse(unparse(AST)) = AST
holds.
This is the equal to finding a valid parse tree which would produce the same AST.
The language is described by a context free S-attributed grammar using a eBNF variant.
So the unparser has to find a valid 'path' through the traversed nodes in which all grammar constraints hold. This bascially means to find a valid allocation of AST nodes to grammar production rules. This is a constraint satisfaction problem (CSP) in general and could be solved, like parsing, by backtracking in O(exp(n)).
Fortunately for parsing, this can be done in O(n³) using GLR (or better restricting the grammar). Because the AST structure is so close to the grammar production rule structure, I was really surprised seeing an implementation where the runtime is worse than parsing: XText uses ANTLR for parsing and backtracking for unparsing.
Questions
- Is a context free S-attribute grammar everything a parser and unparser need to share or are there further constraints, e.g. on the parsing technique / parser implementation?
- I've got the feeling this problem isn't O(exp(n)) in general - could some genius help me with this?
- Is this basically a context-sensitive grammar?
Example1:
Area returns AnyObject -> Pedestrian | Highway
Highway returns AnyObject -> "Foo" Car
Pedestrian returns AnyObject -> "Bar" Bike
Car returns Vehicle -> anyObjectInstance.name="Car"
Bike returns Vehicle -> anyObjectInstance.name="Bike"
So if I have an AST containing
AnyObject -> AnyObject -> Vehicle [name="Car"]
and I know the root can be Area, I could resolve it to
Area -> Highway -> Car
So the (Highway | Pedestrian) decision depends on the subtree decisions. The problem get's worse when a leaf might be, at first sight, one of several types, but has to be a specific one to form a valid path later on.
Example2:
So if I have S-attribute rules returning untyped objects, just assigning some attributes, e.g.
A -> B C {Obj, Obj}
X -> Y Z {Obj, Obj}
B -> "somekeyword" {0}
Y -> "otherkeyword" {0}
C -> "C" {C}
Z -> "Z" {Z}
So if an AST contains
Obj
/ \
"0" "C"
I can unparse it to
A
/ \
B C
just after I could resolve "C" to C.
While traversing the AST, all constraints I can generate from the grammar are satisfied for both rules, A and X, until I hit "C". This means that for
A -> B | C
B -> "map" {MagicNumber_42}
C -> "foreach" {MagicNumber_42}
both solutions for the tree
Obj
|
MagicNumber_42
are valid and it is considered that they have equal semantics ,e.g. syntactic sugar.
Further Information:
Question 1: no, the grammar itself may not be enough. Take the example of an ambiguous grammar. If you ended up with a unique leftmost (rightmost) derivation (the AST) for a given string, you would somehow have to know how the parser eliminated the ambiguity. Just think of the string 'a+b*c' with the naive grammar for expressions 'E:=E+E|E*E|...'.
Question 3: none of the grammar examples you give is context sensitive. The lefthand-side of the productions are a single non-terminal, there is no context.