PLY : Parsing error. Rule unexpectedly matching empty set of tokens

709 views Asked by At

So i'm trying to build a parser with PLY, but it doesn't work. It gives me errors that i can't seem to solve. I extracted a small part of the parser that i put in another file, for testing purpose : it doesn't work either.

Here is the simplified problem that i can't seem to solve :

Input example

#Trying to parse properties following the patter X1[some text]X2[other text]
#Simplified input used
GM[1]US[hello]

From my understanding the parser reads the first property (GM[1]) and triggers an error on the second one. He is taking the shortest possible reduction (which my rules indeed allow) and reads a single property. Then, there is no more rules to follow for the second one, which triggers the error.

So the problem would be : can you help me write my rules properly?

parser.py

#For simplicity, i only give you the rules, and removed operations on the tree.

def p_properties(self,p):
    "properties : property property_list"

def p_property_list(self,p):
    "property_list : "
    "               | properties"

def p_property(self,p):
    "property : PROPERTY_NAME single_property_content"

def p_single_property_content(self,p):
    "single_property_content : OBRACKET TEXT CBRACKET"

The starting rule is "properties". I would expect the input property (GM[1]) to match the rule property, and the second to match property_list. But i think that the first rule matches property AND property_list (as the empty production in property_list), which reduces to properties. Then, there is no more rules available for the second property to be read.

Here is the output of the parser:

State  : 0
Stack  : . LexToken(PROPERTY_NAME,'GM',1,0)
Action : Shift and goto state 2

State  : 2
Stack  : PROPERTY_NAME . LexToken(OBRACKET,'[',1,2)
Action : Shift and goto state 4

State  : 4
Stack  : PROPERTY_NAME OBRACKET . LexToken(TEXT,'1',1,3)
Action : Shift and goto state 7

State  : 7
Stack  : PROPERTY_NAME OBRACKET TEXT . LexToken(CBRACKET,']',1,4)
Action : Shift and goto state 8

State  : 8
Defaulted state 8: Reduce using 4
Stack  : PROPERTY_NAME OBRACKET TEXT CBRACKET . None
Action : Reduce rule [single_property_content -> OBRACKET TEXT CBRACKET] with ['[','1',']'] and goto state 5
Result : <tuple @ 0x7fea37eb4588> (('single_property_content', '1'))

State  : 5
Defaulted state 5: Reduce using 3
Stack  : PROPERTY_NAME single_property_content . None
Action : Reduce rule [property -> PROPERTY_NAME single_property_content] with ['GM',<tuple @ 0x7fea37eb4588>] and goto state 3
Result : <tuple @ 0x7fea3964f798> (('property', 'GM', ('single_property_con ...)

State  : 3
Defaulted state 3: Reduce using 2
Stack  : property . None
Action : Reduce rule [property_list -> <empty>] with [] and goto state 6
Result : <NoneType @ 0xa3f020> (None)

State  : 6
Defaulted state 6: Reduce using 1
Stack  : property property_list . None
Action : Reduce rule [properties -> property property_list] with [<tuple @ 0x7fea3964f798>,None] and goto state 1
Result : <tuple @ 0x7fea3964f678> (('properties', ('property', 'GM', ('sing ...)

State  : 1
Stack  : properties . LexToken(PROPERTY_NAME,'US',1,5)
ERROR: Error  : properties . LexToken(PROPERTY_NAME,'US',1,5)
Error while parsing : LexToken(PROPERTY_NAME,'US',1,5)

State  : 1
Stack  : properties . error
ERROR: Error  : properties . error

State  : 0
Stack  : . error
ERROR: Error  : . error

State  : 0
Stack  : . LexToken(OBRACKET,'[',1,7)
ERROR: Error  : . LexToken(OBRACKET,'[',1,7)

State  : 0
Stack  : . LexToken(TEXT,'hello',1,8)
ERROR: Error  : . LexToken(TEXT,'hello',1,8)

State  : 0
Stack  : . LexToken(CBRACKET,']',1,13)
ERROR: Error  : . LexToken(CBRACKET,']',1,13)

State  : 0
Stack  : . $end
ERROR: Error  : . $end
None

I've modified these rules in 15 different ways (right recursion, left), to no avail. Can you guys tell me what i'm doing wrong?

Edit After some trouble when implementing the solution (only worked when i parsed 2 elements but parsing more created errors), i managed to make it work, thanks to @rici's help.

Solution

def p_properties(self,p):
    """
    properties : property_list property
    """
    p[0] = ('properties',p[1], p[2])


def p_property_list(self,p):
    """
    property_list : property_list property
                    | 
    """

    if(len(p)==3):
        p[0] = ('property_list',p[1],p[2])

def p_property(self,p):
    """
    property : PROPERTY_NAME property_content_list
    """
    p[0] = ('property', p[1], p[2])

Input

GM[1]AP[hello]AW[world]C[1.this is a comment]C[2.this is a comment]C[3.this is a comment]
TM[50]WR[alalala]GM[will it fail?]GM[should succeed]

Result

('properties', ('property_list', ('property_list', ('property_list', 
('property_list', ('property_list', ('property_list', 
('property_list',('property_list', ('property_list', None, 
('property', 'GM', ('property_content', '1'))),
('property', 'AP', ('property_content','hello'))),
('property', 'AW', ('property_content', 'world'))), 
('property', 'C', ('property_content', '1.this is a comment'))), 
('property', 'C', ('property_content', '2.this is a comment'))), 
('property', 'C', ('property_content', '3.this is a comment'))), 
('property', 'TM', ('property_content', '50'))), ('property', 'WR', 
('property_content', 'alalala'))), ('property', 'GM', 
('property_content','will it fail?'))),
('property', 'GM', ('property_content', 'should succeed')))

Thank you for your help.

Loïc.

1

There are 1 answers

3
rici On BEST ANSWER

The productions for a Ply parser are contained in the docstring, which is the first string literal (if any) at the beginning of the function.

The docstring for p_property_list is, therefore, property_list:. (You can confirm this by looking at p_property_list.__doc__.) So that's all that Ply sees. Use a single multiline string literal if you want to provide multiple productions to a single function, which is why you will usually see raw string literals as docstrings:

def p_property_list(self, p):
    """property_list:
                    | property_list property
    """
    # ...

However, it is quite common to use two different functions in such cases. See below.


Don't use right recursion unless necessary. The correct grammar is:

property_list: property
             | property_list property

property: PROPERTY_NAME single_property_content

You could call the first production properties if you prefer, but you don't need two different non-terminals. (You might not need single_property_content either, but that depends on the rest of your grammar.)

In Ply, there is no need to align reduction functions with non-terminals. It is quite common to use different functions for the different alternatives whose reduce actions differ, and you can often combine similar productions for different non-terminals. For a simple example:

# This function can be used as the base case of any list non-terminal
# whose semantic value is a list (and whose base case is not empty)
def p_list_base(self, p):
    """property_list: property
       function_list: function
    """
    p[0] = [p[1]]

# This function can be used as the recursion of any list non-terminal
# whose semantic value is a list.
def p_list_recur(self, p):
    """property_list: property_list property
       function_list: function_list function
    """
    p[1].append(p[2])
    p[0] = p[1]