Treetop parser : Function definition syntax - n arguments

565 views Asked by At

I'm currently trying to describe some basic Ruby grammar but I'm now stuck with function definition. Indeed, I don't know how to handle 'n' argument. Here is the code I use to handle functions containing from 0 to 2 args :

  rule function_definition
    'def' space? identifier space? '(' space? expression? space? ','? expression? space? ')'
      block
    space? 'end' <FunctionDefinition>
  end  

How could I do to handle 'n' argument ? Is there any recursive way to do that ?

EDIT :

I wanted to highlight the fact that I need the arguments to be in the result tree. Like :

 Argument offset=42, "arg1"
 Argument offset=43, "arg2"
 Argument offset=44, "arg3"

So I need to do a cstom SyntaxNode Subclass declaration, just like I did for function_definition rule for instance.

2

There are 2 answers

2
Martin Carpenter On BEST ANSWER

You want something like (untested):

'def' space? identifier space? '(' space? ( expression ( space? ',' expression )* )? space?  ')'

(NB if this is a ruby style def then the parens are also optional in the case where there are no arguments)

Edit to demonstrate extracting the arguments from the parse tree -- here I spit out the text_value of each argument (FunctionArg) syntax node but you could of course do anything:

foo.rb:

# Prepend current directory to load path
$:.push('.')

# Load treetop grammar directly without compilation
require 'polyglot'
require 'treetop'
require 'def'

# Classes for bespoke nodes
class FunctionDefinition < Treetop::Runtime::SyntaxNode ; end
class FunctionArg < Treetop::Runtime::SyntaxNode ; end

# Some tests
[
  'def foo() block end',
  'def foo(arg1) block end',
  'def foo(arg1, arg2) block end',
  'def foo(arg1, arg2, arg3) block end',
].each do |test|
  parser = DefParser.new
  tree = parser.parse( test )
  raise RuntimeError, "Parsing failed on line:\n#{test}" unless tree
  puts test
  puts "identifier=#{tree.function_identifier}"
  puts "args=#{tree.function_args.inspect}"
  puts
end

def.tt:

grammar Def

  # Top level rule: a function
  rule function_definition
    'def' space identifier space? '(' space? arg0 more_args space? ')' space block space 'end' <FunctionDefinition>
    {
      def function_identifier
        identifier.text_value
      end
      def function_args
        arg0.is_a?( FunctionArg ) ? [ arg0.text_value ] + more_args.args : []
      end
    }
  end

  # First function argument
  rule arg0
    argument?
  end

  # Second and further function arguments
  rule more_args
    ( space? ',' space? argument )* 
    {
      def args
        elements.map { |e| e.elements.last.text_value }
      end
    }
  end

  # Function identifier
  rule identifier
    [a-zA-Z_] [a-zA-Z0-9_]*
  end

  # TODO Dummy rule for function block
  rule block
    'block'
  end

  # Function argument
  rule argument
    [a-zA-Z_] [a-zA-Z0-9_]* <FunctionArg>
  end

  # Horizontal whitespace (htab or space character).
  rule space
    [ \t]
  end

end

Output:

def foo() block end
identifier=foo
args=[]

def foo(arg1) block end
identifier=foo
args=["arg1"]

def foo(arg1, arg2) block end
identifier=foo
args=["arg1", "arg2"]

def foo(arg1, arg2, arg3) block end
identifier=foo
args=["arg1", "arg2", "arg3"]
0
Dyson Wilkes On

A better method might be to use recursion.

rule function_definition
  'def' space identifier space? '(' space? argList? space? ')' block 'end'
end

rule argList
   identifier space? ',' space? argList
   / identifier
end