Parsing Context Sensitive Language

4.7k views Asked by At

i am reading the Definitive ANTLR reference by Terence Parr, where he says:

Semantic predicates are a powerful means of recognizing context-sensitive language structures by allowing runtime information to drive recognition

But the examples in the book are very simple. What i need to know is: can ANTLR parse context-sensitive rules like:

xAy --> xBy

If ANTLR can't parse these rules, is there is another tool that deals with context-sensitive grammars?

2

There are 2 answers

2
Ira Baxter On BEST ANSWER

ANTLR parses only grammars which are LL(*). It can't parse using grammars for full context-sensitive languages such as the example you provided. I think what Parr meant was that ANTLR can parse some languages that require some (left) context constraints.

In particular, one can use semantic predicates on "reduction actions" (we do this for GLR parsers used by our DMS Software Reengineering Toolkit but the idea is similar for ANTLR, I think) to inspect any data collected by the parser so far, either as ad hoc side effects of other semantic actions, or in a partially-built parse tree.

For our DMS-based DMS-based Fortran front end, there's a context-sensitive check to ensure that DO-loops are properly lined up. Consider:

 DO  20, I= ...
   DO 10, J = ...
       ...
20  CONTINUE
10  CONTINUE

From the point of view of the parser, the lexical stream looks like this:

DO  <number> , <variable> =  ...
    DO <number> , <variable> = ...
         ...
<number> CONTINUE
<number> CONTINUE

How can the parser then know which DO statement goes with which CONTINUE statement? (saying that each DO matches its closest CONTINUE won't work, because FORTRAN can share a CONTINUE statement with multiple DO-heads).

We use a semantic predicate "CheckMatchingNumbers" on the reduction for the following rule:

block = 'DO' <number> rest_of_do_head newline 
         block_of_statements
         <number> 'CONTINUE' newline ; CheckMatchingNumbers

to check that the number following the DO keyword, and the number following the CONTINUE keyword match. If the semantic predicate says they match, then a reduction for this rule succeeds and we've aligned the DO head with correct CONTINUE. If the predicate fails, then no reduction is proposed (and this rule is removed from candidates for parsing the local context); some other set of rules has to parse the text.

The actual rules and semantic predicates to handle FORTRAN nesting with shared continues is more complex than this but I think this makes the point.

What you want is full context-sensitive parsing engine. I know people have built them, but I don't know of any full implementations, and don't expect them to be fast.

I did follow Quinn Taylor Jackson's MetaS grammar system for awhile; it sounded like a practical attempt to come close.

0
Anderson Green On

It is comparatively easy to write a context-sensitive parser in Prolog. This program parses the string [a,is,less,than,b,and,b,is,less,than,c], converting it into [a,<,b,<,c]:

:- initialization(main).
:- set_prolog_flag('double_quotes','chars').

main :-
    rewrite_system([a,is,less,than,b,and,b,is,less,than,c],X),writeln('\nFinal output:'),writeln(X).

rewrite_rule([[A,<,B],and,[B,<,C]],[A,<,B,<,C]).
rewrite_rule([A,is,less,than,B],[A,<,B]).
rewrite_rule([[A,<,B],and,C,than,D],[[A,<,B],and,A,is,C,than,D]).
rewrite_rule([A,<,B],[[A,<,B]]).

rewritten(A) :- atom(A);bool(A).
bool(A) :- atom(A).
bool([A,<,B,<,C]) :- atom(A),atom(B),atom(C).
bool([A,and,B]) :- bool(A),bool(B).


% this predicate is from https://stackoverflow.com/a/8312742/975097
replace(ToReplace, ToInsert, List, Result) :-
    once(append([Left, ToReplace, Right], List)),
    append([Left, ToInsert, Right], Result).

rewrite_system(Input,Output) :-
    rewritten(Input),Input=Output;
    rewrite_rule(A,B),
    replace(A,B,Input,Input1),
    writeln(Input1),
    rewrite_system(Input1,Output).

Using the same algorithm, I also wrote an adaptive parser that "learns" new rewrite rules from its input.