I'm trying to use pyPEG2 to translate MoinMoin markup to Markdown, and I need to pay attention to newlines in certain cases. However, I can't even get my newline parsing tests to work. I'm new to pyPEG and my Python is rusty. Please bear with me.
Here's the code:
#!/usr/local/bin/python3
from pypeg2 import *
import re
class Newline(List):
grammar = re.compile(r'\n')
parse("\n", Newline)
parse("""
""", Newline)
This results in:
Traceback (most recent call last):
File "./pyPegNewlineTest.py", line 7, in <module>
parse("\n", Newline)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pypeg2/__init__.py", line 667, in parse
t, r = parser.parse(text, thing)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pypeg2/__init__.py", line 794, in parse
raise r
File "<string>", line 2
^
SyntaxError: expecting match on \n
It's as if pypeg is inserting an empty line after the \n.
Trying other options such as
grammar = re.compile(r'\n', re.MULTILINE)
grammar = re.compile(r'\r\n|\r|\n', re.MULTILINE)
grammar = contiguous(re.compile(r'\r\n|\r|\n', re.MULTILINE))
and various combinations of those don't change the error message (although I don't think I tried all combinations). Changing Newline
to subclass str
instead of List
doesn't change the error either.
Update
I have figured out that pypeg is stripping the newline before parsing it:
#!/usr/local/bin/python3
from pypeg2 import *
import re
class Newline(str):
grammar = contiguous(re.compile(r'a'))
parse("\na", Newline)
parse("""
a""", Newline)
print("Success, of a sort.")
Running this results in:
Success, of a sort.
If I override the Newline
's parse
method I don't even see the newline. The first thing it gets is the "a". This is consistent with what I'm seeing elsewhere. pypeg strips all leading whitespace, even when you specify contiguous
.
So, that's what's happening. Not sure what to do about it.
Yes by default pypeg remove the whitespaces including the newlines. This is easly configurable by setting the optional
whitespace
argument in theparse()
function, e.g. in:Doing so spaces and tabs will still be skipped, but not newlines
\n
. With this example the parser now correctly find the syntax error: