How can I obtain a pointer to a Grammar token or regex?

144 views Asked by At

This is similar to this question for classes, except the same procedure does not seem to work for Grammars.

grammar TestGrammar {
    token num { \d+ }
}


my $test-grammar = TestGrammar.new();
my $token = $test-grammar.^lookup('num');

say "3" ~~ $token;

This returns:

Type check failed in binding to parameter '<anon>'; expected TestGrammar but got Match (Match.new(:orig("3")...)
  in regex num at pointer-to-token.raku line 2
  in block <unit> at pointer-to-token.raku line 9

This seems to point to the fact that you need binding to a class/grammar, and not a "bare" token. However, it's not clear how to do that. Passing grammar or an instance of it as a parameter returns a different error:

Cannot look up attributes in a TestGrammar type object. Did you forget a '.new'?

Any idea of why this does not really work?

Update: using ^find_method as indicated in this question that is referenced from the one above does not work either. Same issue. Using assuming does not fix it either.

Update 2: I seem to be getting somewhere here:

my $token = $test-grammar.^lookup('num').assuming($test-grammar);

say "33" ~~ $token;

Does not yield any syntax error, however it returns False no matter what.

1

There are 1 answers

0
raiph On BEST ANSWER

You're missing an argument at the end of your code:

grammar TestGrammar {
    token num { \d+ }
}

my $test-grammar = TestGrammar.new();
my $token = $test-grammar.^lookup('num');

say "3" ~~ $token($test-grammar.new: orig => $_);
                 ^^ -- the missing/new bit -- ^^

I'm confident you can more or less tuck that argument away -- but .assuming immediately reifies/evaluates the argument(s) being assumed so that won't work out. Instead we need to postpone that step until the smart match call (to get ahold of the $_ as it is during the smart match call).


We need to change the $token declaration and call. Here are two possibilities I can think of:

  • Stick with $token

    Change its declaration, and turn its use with ~~ into a method call:

    my $token = { $test-grammar.^lookup('num')($test-grammar.new: orig => $_) }
    
    say "3" ~~ .$token;
               ^ insert dot to make it a method call with `$_` as invocant
    
  • Switch to &token

    Now there's no need for the dot in the smart match line. Even better, you can drop the sigil:

    my &token = { $test-grammar.^lookup('num')($test-grammar.new: orig => $_) }
    
    say "3" ~~ .&token; # Same as:
    say "3" ~~  &token; # Same as:
    say "3" ~~   token;
    

A proper answer to your question should really provide a decent answer to these three questions:

  • Why does one have to pass a new grammar object?

  • What's this orig business?

  • How could anyone have known this?

I'm not going to answer those questions, at least not adequately/tonight, and perhaps never. (I recall investigating this years ago and getting bogged down in Rakudo code.)

From memory, my working hypothesis boils down:

  • There's a fundamental aspect of the regex/grammar machinery wherein it presumes a match/grammar object is setup at the start (and then passed along to subrules as matching happens).

  • There's a difference between a method/rule declared with my vs with has regarding how that happens. (Presumably what self is bound to.)

  • That difference means user code has to deal with this disparity in the scenario covered by your question.