I am attempting to generate a parser related to recipe ingredients. I am noticing that the order the parser handles tokens seems to follow the token's line-item order in the jison file, vs. whats defined in the EBNF grammar.
For example, parsing 6 tablespoons unsalted butter, cut into 1-inch pieces
yields:
Error: Parse error on line 1:
6 tablespoons unsalted
--^
Expecting 'UNIT_NAME', 'NUMBER', 'SLASH', got 'WORD'
I would expect the grammar to see UNIT_NAME
which is tablespoons
before it eats a WORD
. What is the right grammar approach here? I have been using the interactive Jison parser to validate the grammar states and didnt see any gotchas so far.
Jison Grammer
%lex
%options flex case-insensitive
UnitName [teaspoons|teaspoon|tablespoons|tablespoon|fluid ounces|fluid ounce|ounces|ounce|cups|cup|pints|pint|quarts|quart|gallons|gallon|pounds|pound|milliliters|milliliter|deciliters|deciliter|liters|liter]\b
Word \w+\b
NUMBER [1-9][0-9]+|[0-9]
CHAR [a-zA-Z0-9_-]
%%
\s+ /* skip whitespace */
{NUMBER} return 'NUMBER'
{UnitName} return "UNIT_NAME";
{Word} return 'WORD'
{CHAR} return 'CHAR'
"/" return "SLASH";
"-" return "HYPHEN"
"," return "COMMA";
<<EOF>> return 'EOF';
/lex
/* enable EBNF grammar syntax */
%ebnf
/* language grammar */
%start ingredient
%%
ingredient
: ingredient_format
{ return $1; }
;
ingredient_format
: unit_count UNIT_NAME ingredient_name COMMA ingredient_info EOF
{ $$ = {'count': $1, 'unit': $2, 'item': $3, info: $5}; }
| unit_count UNIT_NAME ingredient_name EOF
{ $$ = {'count': $1, 'unit': $2, 'item': $3, info: null}; }
;
unit_count
: NUMBER
{ $$ = parseInt($1); }
| NUMBER SLASH NUMBER
{ $$ = parseInt($1) / parseInt($3); }
| NUMBER NUMBER SLASH NUMBER
{ $$ = parseInt($1) + (parseInt($2) / parseInt($4)); }
;
ingredient_name
: WORD+
{ $$ = $1; }
;
ingredient_info
: ""
{ $$ = ''; }
| WORD+
{ $$ = $1; }
;
Gist
I created a with some text strings and a simple parser to test: https://gist.github.com/aphexddb/ddc83d57c7f1c1b96458