pyPEG2 parsing of newlines

629 views Asked by At

I'm trying to use pyPEG2 to translate MoinMoin markup to Markdown, and I need to pay attention to newlines in certain cases. However, I can't even get my newline parsing tests to work. I'm new to pyPEG and my Python is rusty. Please bear with me.

Here's the code:

#!/usr/local/bin/python3
from pypeg2 import *
import re

class Newline(List):
    grammar = re.compile(r'\n')

parse("\n", Newline)
parse("""
""", Newline)

This results in:

Traceback (most recent call last):
  File "./pyPegNewlineTest.py", line 7, in <module>
    parse("\n", Newline)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pypeg2/__init__.py", line 667, in parse
    t, r = parser.parse(text, thing)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pypeg2/__init__.py", line 794, in parse
    raise r
  File "<string>", line 2

    ^
SyntaxError: expecting match on \n

It's as if pypeg is inserting an empty line after the \n.

Trying other options such as

    grammar = re.compile(r'\n', re.MULTILINE)
    grammar = re.compile(r'\r\n|\r|\n', re.MULTILINE)
    grammar = contiguous(re.compile(r'\r\n|\r|\n', re.MULTILINE))

and various combinations of those don't change the error message (although I don't think I tried all combinations). Changing Newline to subclass str instead of List doesn't change the error either.

Update

I have figured out that pypeg is stripping the newline before parsing it:

#!/usr/local/bin/python3
from pypeg2 import *                 
import re
class Newline(str):
    grammar = contiguous(re.compile(r'a'))

parse("\na", Newline)
parse("""
a""", Newline)

print("Success, of a sort.")

Running this results in:

Success, of a sort.

If I override the Newline's parse method I don't even see the newline. The first thing it gets is the "a". This is consistent with what I'm seeing elsewhere. pypeg strips all leading whitespace, even when you specify contiguous.

So, that's what's happening. Not sure what to do about it.

1

There are 1 answers

0
Florian On

Yes by default pypeg remove the whitespaces including the newlines. This is easly configurable by setting the optional whitespace argument in the parse() function, e.g. in:

parse("\na", Newline, whitespace=re.compile(r"[ \t\r]"))

Doing so spaces and tabs will still be skipped, but not newlines \n. With this example the parser now correctly find the syntax error:

SyntaxError: expecting match on a