Writing correct LL(1) grammars?

2.8k views Asked by At

I'm currently trying to write a (very) small interpreter/compiler for a programming language. I have set the syntax for the language, and I now need to write down the grammar for the language. I intend to use an LL(1) parser because, after a bit of research, it seems that it is the easiest to use.

I am new to this domain, but from what I gathered, formalising the syntax using BNF or EBNF is highly recommended. However, it seems that not all grammars are suitable for implementation using an LL(1) parser. Therefore, I was wondering what was the correct (or recommended) approach to writing grammars in LL(1) form.

Thank you for your help, Charlie.

PS: I intend to write the parser using Haskell's Parsec library.

EDIT: Also, according to SK-logic, Parsec can handle an infinite lookahead (LL(k) ?) - but I guess the question still stands for that type of grammar.

3

There are 3 answers

0
Timo On

I'm not an expert on this as I have only made a similar small project with an LR(0) parser. The general approach I would recommend:

  1. Get the arithmetics working. By this, make rules and derivations for +, -, /, * etc and be sure that the parser produces a working abstract syntax tree. Test and evaluate the tree on different input to ensure that it does the arithmetic correctly. Make things step by step. If you encounter any conflict, resolve it first before moving on.

  2. Get simper constructs working like if-then-else or case expressions working.

  3. Going further depends more on the language you're writing the grammar for.

Definetly check out other programming language grammars as an reference (unfortunately I did not find in 1 min any full LL grammar for any language online, but LR grammars should be useful as an reference too). For example:

ANSI C grammar

Python grammar

and of course some small examples in Wikipedia about LL grammars Wikipedia LL Parser that you probably have already checked out.

I hope you find some of this stuff useful

4
Apalala On

There are algorithms both for determining if a grammar is LL(k). Parser generators implement them. There are also heuristics for converting a grammar to LL(k), if possible.

But you don't need to restrict your simple language to LL(1), because most modern parser generators (JavaCC, ANTLR, Pyparsing, and others) can handle any k in LL(k).

More importantly, it is very likely that the syntax you consider best for your language requires a k between 2 and 4, because several common programming constructs do.

0
Antimony On

So first off, you don't necessarily want your grammar to be LL(1). It makes writing a parser simpler and potentially offers better performance, but it does mean that you're language will likely end up more verbose than commonly used languages (which generally aren't LL(1)).

If that's ok, your next step is to mentally step through the grammar, imagine all possibilities that can appear at that point, and check if they can be distinguished by their first token.

There's two main rules of thumb to making a grammar LL(1)

  1. If of multiple choices can appear at a given point and they can start with the same token, add a keyword in front telling you which choice was taken.
  2. If you have an optional or repeated part, make sure it is followed by an ending token which can't appear as the first token of the optional/repeated part.
  3. Avoid optional parts at the beginning of a production wherever possible. It makes the first two steps a lot easier.