I have an input file with multiple lines and fields separated by space. My definition files are:
scanner.xrl
:
Definitions.
DIGIT = [0-9]
ALPHANUM = [0-9a-zA-Z_]
Rules.
(\s|\t)+ : skip_token.
\n : {end_token, {new_line, TokenLine}}.
{ALPHANUM}+ : {token, {string, TokenLine, TokenChars}}.
Erlang code.
parser.yrl
:
Nonterminals line.
Terminals string.
Rootsymbol line.
Endsymbol new_line.
line -> string : ['$1'].
line -> string line: ['$1'|'$2'].
Erlang code.
When running it as it is, the first line is parsed and then it stops:
1> A = <<"a b c\nd e\nf\n">>.
2> {ok, T, _} = scanner:string(binary_to_list(A)).
{ok,[{string,1,"a"},
{string,1,"b"},
{string,1,"c"},
{new_line,1},
{string,2,"d"},
{string,2,"e"},
{new_line,2},
{string,3,"f"},
{new_line,3}],
4}
3> parser:parse(T).
{ok,[{string,1,"a"},{string,1,"b"},{string,1,"c"}]}
If I remove the Endsymbol
line from parser.yrl
and change the scanner.xrl
file as follow:
Definitions.
DIGIT = [0-9]
ALPHANUM = [0-9a-zA-Z_]
Rules.
(\s|\t|\n)+ : skip_token.
{ALPHANUM}+ : {token, {string, TokenLine, TokenChars}}.
Erlang code.
All my line are parsed as a single item:
1> A = <<"a b c\nd e\nf\n">>.
<<"a b c\nd e\nf\n">>
2> {ok, T, _} = scanner:string(binary_to_list(A)).
{ok,[{string,1,"a"},
{string,1,"b"},
{string,1,"c"},
{string,2,"d"},
{string,2,"e"},
{string,3,"f"}],
4}
3> parser:parse(T).
{ok,[{string,1,"a"},
{string,1,"b"},
{string,1,"c"},
{string,2,"d"},
{string,2,"e"},
{string,3,"f"}]}
What would be the proper way to signal to the parser that each line should be treated as a separate item? I would like my result to look something like:
{ok,[[{string,1,"a"},
{string,1,"b"},
{string,1,"c"}],
[{string,2,"d"},
{string,2,"e"}],
[{string,3,"f"}]]}
Here is one of the correct lexer/parser pair that does the job with 1 shift/reduce only but I think it will solve your problem, you only need to cleanup tokens as you prefer.
I'm pretty sure there can be much easier and faster way to do it, but during my "lexer fighting times" it was so hard to find at least some information that I hope this will give the idea how to proceed with parsing with Erlang.
scanner.xrl
parser.yrl
output
The parser flow is the following:
Please allow me to give few comments on issues I've discovered in the original code.