BNF grammar + Gold LALR parser, failing to distinguish special case NewLine from Whitespace

935 views Asked by At
  • I want to consider whitespaces and newlines as normal whitespaces.
  • I want to distinguish newlines from other whitespaces moreover to allow special case.

First attempt to write a compliant grammar fails.

Here is the grammar:

! ------------------------------------------------- Sets

{WS}           = {Whitespace} - {CR} - {LF}
{ID Head}      = {Letter} + [_]
{ID Tail}      = {Alphanumeric} + [_]
{String Chars} = {Printable} + {HT} - ["\]

! ------------------------------------------------- Terminals

! The following defines the Whitespace terminal using the {WS}
! set - which excludes the carriage return and line feed 
! characters

Whitespace    = {WS}+ | {CR}{LF} | {CR} | {LF}
!NewLine       = {CR}{LF} | {CR} | {LF}
MyNewLine      = {CR}{LF} | {CR} | {LF}
3

There are 3 answers

0
Brannon On BEST ANSWER

They are ambiguous because they both contain the same sub-set {CR}{LF} | {CR} | {LF}.

Given the input {CR}{LF} the parser has no way to tell which terminal it should match.

A table-driven parser isn't really designed to handle "special cases" directly. If you want to ignore new-lines in some contexts, but attribute meaning to them in others then you'll have to handle that in your reductions (i.e. tokenize the newlines separately, and discard them in your reductions), but that will get ugly.

A (potentially) better solution is to use tokenizer states (possibly controlled from the parser), to change how the newline inputs are tokenized. It's hard to say without fully understanding your grammar. Plus, it's been a few years since I've messed with this stuff.

0
batbrat On

I think the grammar is ambiguous in the sense that both Whitespace and MyNewLine match new line charachters. Since it throws a wobbly doing it your way, I suggest detecting whitespace and new lines separately and deciding what to do with the newline on a case by case basis.

I am not too experienced in the area, but thats what I remember from my Theory Of Computation class and Compiler Design class.

I hope this helps.

3
menjaraz On

A late answer.

To my dismay, I'm just a recent late bloomer ;-) member.

Keep using the usual Line-Based Grammar Declarations

! ====================================================================
{Whitespace Ch} = {Whitespace} - {CR} - {LF}

Whitespace = {Whitespace Ch}+
Newline    = {CR}{LF} | {CR} | {LF}
! ====================================================================

Whitespace vs. Newline distinction is already taken into account!

Consider addressing your special case when writing your production rules.

For complex case you may even need to define some virtual terminal (advanced technique).

You may elaborate your grammar and ask by posting it again.

Last Edit: Please, share if you've already addressed the issue. Thanks.