Jison: Getting parsed token instead of what is defined in Grammar

500 views Asked by At

I am attempting to generate a parser related to recipe ingredients. I am noticing that the order the parser handles tokens seems to follow the token's line-item order in the jison file, vs. whats defined in the EBNF grammar.

For example, parsing 6 tablespoons unsalted butter, cut into 1-inch pieces yields:

Error: Parse error on line 1:
6 tablespoons unsalted
--^
Expecting 'UNIT_NAME', 'NUMBER', 'SLASH', got 'WORD'

I would expect the grammar to see UNIT_NAME which is tablespoons before it eats a WORD. What is the right grammar approach here? I have been using the interactive Jison parser to validate the grammar states and didnt see any gotchas so far.

Jison Grammer

%lex
%options flex case-insensitive

UnitName                    [teaspoons|teaspoon|tablespoons|tablespoon|fluid ounces|fluid ounce|ounces|ounce|cups|cup|pints|pint|quarts|quart|gallons|gallon|pounds|pound|milliliters|milliliter|deciliters|deciliter|liters|liter]\b
Word                        \w+\b
NUMBER                      [1-9][0-9]+|[0-9]
CHAR                        [a-zA-Z0-9_-]

%%

\s+                      /* skip whitespace */
{NUMBER}                 return 'NUMBER'
{UnitName}               return "UNIT_NAME";
{Word}                   return 'WORD'
{CHAR}                   return 'CHAR'
"/"                      return "SLASH";
"-"                      return "HYPHEN"
","                      return "COMMA";
<<EOF>>                  return 'EOF';

/lex

/* enable EBNF grammar syntax */
%ebnf

/* language grammar */
%start ingredient
%%

ingredient
    : ingredient_format
        { return $1; }
    ;

ingredient_format
    : unit_count UNIT_NAME ingredient_name COMMA ingredient_info EOF
        { $$ = {'count': $1, 'unit': $2, 'item': $3, info: $5}; }
    | unit_count UNIT_NAME ingredient_name EOF
        { $$ = {'count': $1, 'unit': $2, 'item': $3, info: null}; }
    ;

unit_count
    : NUMBER
        { $$ = parseInt($1); }
    | NUMBER SLASH NUMBER
        { $$ = parseInt($1) / parseInt($3); }
    | NUMBER NUMBER SLASH NUMBER
        { $$ = parseInt($1) + (parseInt($2) / parseInt($4)); }
    ;

ingredient_name
    : WORD+
        { $$ = $1; }
    ;

ingredient_info
    : ""
        { $$ = ''; }
    | WORD+
        { $$ = $1; }
    ;

Gist

I created a with some text strings and a simple parser to test: https://gist.github.com/aphexddb/ddc83d57c7f1c1b96458

0

There are 0 answers