How to get the abstact syntax tree using the language concrete syntax tree?

745 views Asked by At

How to use the concrete syntax tree to parse a file and generate the abstract syntax tree?

I came across with concrete syntax trees on this blog post about ungrammar. But I can't wrap my head around on how to build the parser.

1

There are 1 answers

2
CAD97 On

A concrete syntax tree is just a lossless representation of source code in tree form. It's basically a superset of an abstract syntax tree, as it contains the same information with the relative same structure, but with the extra "trivia" information that an abstract syntax tree would throw away.

If you're familiar with more traditional formal parsing techniques, you might also have heard it called just a "parse tree," which would be output by a non-actions-based parser generator, which you would typically then post-process to an AST more amenable to later compiler passes.

A CST is closer to the AST in that it typically matches the semantic structure of the language more than the lexical structure, but ultimately they're all the same basic idea of a structure, just representing slightly different views of the parsed language.

So whether you're parsing to the formal parse tree, a CST, or directly to an AST (or even an IR bytecode), none of this has any direct impact on what parsing techniques you use, just on what structure you build up while parsing.

So your question boils down to the opinion question of "how should I parse source code," which is quite an open question. Parser combinators tend to be popular in Rust, but even just fixed-lookahead recursive descent is quite powerful and simple to do.