Forcing gaps between words in a Marpa grammar

Question

Forcing gaps between words in a Marpa grammar

283 views Asked by Adrian Pronk At 08 September 2013 at 23:21

I'm trying to set up a grammar that requires that [\w] characters cannot appear directly adjacent to each other if they are not in the same lexeme. That is, words must be separated from each other by a space or punctuation.

Consider the following grammar:

use Marpa::R2; use Data::Dump;

my $grammar = Marpa::R2::Scanless::G->new({source  => \<<'END_OF_GRAMMAR'});

:start ::= Rule
Rule ::= '9' 'september'

:discard ~ whitespace
whitespace ~ [\s]+

END_OF_GRAMMAR

my $recce = Marpa::R2::Scanless::R->new({grammar => $grammar});
dd $recce->read(\'9september');

This parses successfully. Now I want to change the grammar to force a separation between 9 and september. I thought of doing this by introducing an unused lexeme that matches [\w]+:

use Marpa::R2; use Data::Dump;

my $grammar = Marpa::R2::Scanless::G->new({source  => \<<'END_OF_GRAMMAR'});

:start ::= Rule
Rule ::= '9' 'september'

:discard ~ whitespace
whitespace ~ [\s]+

word ~ [\w]+      ### <== Add unused lexeme to match joined keywords
END_OF_GRAMMAR

my $recce = Marpa::R2::Scanless::R->new({grammar => $grammar});
dd $recce->read(\'9september');

Unfortunately, this grammar fails with:

A lexeme is not accessible from the start symbol: word
Marpa::R2 exception at marpa.pl line 3.

Although this can be resolved by using a lexeme default statement:

use Marpa::R2; use Data::Dump;

my $grammar = Marpa::R2::Scanless::G->new({source  => \<<'END_OF_GRAMMAR'});
lexeme default = action => [value]  ### <== Fix exception by adding lexeme default statement

:start ::= Rule
Rule ::= '9' 'september'

:discard ~ whitespace
whitespace ~ [\s]+

word ~ [\w]+
END_OF_GRAMMAR

my $recce = Marpa::R2::Scanless::R->new({grammar => $grammar});
dd $recce->read(\'9september');

This results in the following output:

Inaccessible symbol: word
Error in SLIF parse: No lexemes accepted at line 1, column 1
* String before error: 
* The error was at line 1, column 1, and at character 0x0039 '9', ...
* here: 9september
Marpa::R2 exception at marpa.pl line 16.

That is, the parse has failed due to the fact that there is no gap between 9 and september which is exactly what I want to happen. The only fly in the ointment is that there is an annoying Inaccessible symbol: word message on STDERR because the word lexeme is not used in the actual grammar.

I see that in Marpa::R2::Grammar I could have declared word as inaccessible_ok in the constructor options but I can't do that in Marpa::R2::Scanless.

I also could have done something like the following:

Rule ::= nine september
nine ~ word
september ~ word

then used a pause to use custom code to examine the actual lexeme value and return the appropriate lexeme depending on the value.

What is the best way to construct a grammar that uses keywords or numbers and words but will disallow adjacent lexemes to be run together without white space or punctuation separating them?

Original Q&A

There are 1 answers

**amon** · Answer 1 · 2013-09-08T23:35:35+00:00

Well, the obvious solution is to require some whitespace in between (on the G1 level). When we use the following grammar

:default ::= action => ::array

:start ::= Rule
Rule ::= '9' (Ws) 'september'

Ws ::= [\s]+

:discard ~ whitespace
whitespace ~ [\s]+

then 9september fails, but 9 september is parsed. Important points to note:

Lexemes can be both discarded and required, when they are both a longest token. This is why the :discard and Ws rule don't interfere with each other. Marpa doesn't mind this kind of “ambiguity”.
The Ws rule is enclosed in parens, which discards the value – to keep the resulting parse tree clean.
You do not usually want to use tricks like phantom lexemes to misguide the parser. That way lies breakage.
When every bit of whitespace is important, you might want to get rid of :discard ~ whitespace. This is meant to be used e.g. for C-like languages where whitespace traditionally does not matter.

TechQA.

Forcing gaps between words in a Marpa grammar

There are 1 answers

Related Questions in PERL

Related Questions in PARSING

Related Questions in MARPA

Popular Questions

Trending Questions