Get line number in yecc

65 views Asked by At

I'm using yecc to parse my tokenized asm-like code. After providing code like "MOV [1], [2]\nJMP hello" and after lexer'ing, this is what I'm getting in response.

[{:opcode, 1, :MOV}, {:register, 1, 1}, {:",", 1}, {:register, 1, 2},
  {:opcode, 2, :JMP}, {:identifer, 2, :hello}]

When I parse this I'm getting

[%{operation: [:MOV, [:REGISTER, 1], [:REGISTER, 2]]},
  %{operation: [:JMP, [:CONST, :hello]]}]

But I want every operation to have line number in order to get meaningful errors further in code.

So I changed my parser to this:

Nonterminals
code statement operation value.

Terminals
label identifer integer ',' opcode register address address_in_register line_number.

Rootsymbol code.

code -> line_number statement      : [{get_line('$1'), '$2'}].
code -> line_number statement code : [{get_line('$1'), '$2'} | '$3'].
%code -> statement      : ['$1'].
%code -> statement code : ['$1' | '$2'].

statement -> label     : #{'label' => label('$1')}.
statement -> operation : #{'operation' => '$1'}.

operation -> opcode value ',' value : [operation('$1'), '$2', '$4'].
operation -> opcode value           : [operation('$1'), '$2'].
operation -> opcode identifer       : [operation('$1'), value('$2')].
operation -> opcode                 : [operation('$1')].

value -> integer  : value('$1').
value -> register : value('$1').
value -> address  : value('$1').
value -> address_in_register : value('$1').

Erlang code.
get_line({_, Line, _})                 -> Line.

operation({opcode, _, OpcodeName})     -> OpcodeName.

label({label, _, Value})               -> Value.

value({identifer, _, Value})           -> ['CONST', Value];
value({integer, _, Value})             -> ['CONST', Value];
value({register, _, Value})            -> ['REGISTER', Value];
value({address, _, Value})             -> ['ADDRESS', Value];
value({address_in_register, _, Value}) -> ['ADDRESS_IN_REGISTER', Value].

(commented code is old, working rule)

Now I'm getting {:error, {1, :assembler_parser, ['syntax error before: ', ['\'MOV\'']]}}

After providing same input. How to fix this?

1

There are 1 answers

1
José Valim On BEST ANSWER

My suggestion is to keep the line numbers in the tokens and not as separate tokens and then change how you build the operations.

So I would suggest this:

operation -> opcode value ',' value : [operation('$1'), line('$1'), '$2', '$4'].
operation -> opcode value           : [operation('$1'), line('$1'), '$2'].
operation -> opcode identifer       : [operation('$1'), line('$1'), value('$2')].
operation -> opcode                 : [operation('$1'), line('$1')].

line({_, Line, _}) -> Line.

Or even this if you want to mirror Elixir AST:

operation -> opcode value ',' value : {operation('$1'), meta('$1'), ['$2', '$4']}.
operation -> opcode value           : {operation('$1'), meta('$1'), ['$2']}.
operation -> opcode identifer       : {operation('$1'), meta('$1'), [value('$2')]}.
operation -> opcode                 : {operation('$1'), meta('$1'), []}.

meta({_, Line, _}) -> [{line, Line}].