Difference in parsing behavior between runTests and parseString?

22 views Asked by At

I observe a difference in the results between runTests as a string and when reading from a file .

This is the code that I have

import pyparsing as pp
import sys
from pathlib import Path

# Largely built following the example and comments in https://stackoverflow.com/questions/55909620/capturing-block-over-multiple-lines-using-pyparsing
pp.enable_diag(pp.Diagnostics.enable_debug_on_named_expressions)

EOL = pp.LineEnd()
EmptyLine = pp.Suppress(pp.LineStart() + EOL)
englishLine_StopSeparator = pp.LineStart() + "OM"

englishLines = pp.Optional(EOL) + pp.Group(
    pp.OneOrMore(pp.SkipTo(pp.LineEnd()) + EOL, stopOn=englishLine_StopSeparator)
)

titleLine = pp.Combine(
    pp.Word(pp.nums) + (pp.SkipTo(EOL)) + pp.Suppress(EOL)
).setResultsName("titleLine_Section*")

invocationLine_StopSeparator = pp.LineStart() + titleLine

invocationLines = pp.Optional(EOL) + pp.Group(
    pp.OneOrMore(pp.SkipTo(pp.LineEnd()) + EOL, stopOn=invocationLine_StopSeparator)
)

Separator1 = pp.Keyword("*************")
prasna_Separator = pp.Keyword("============").suppress()
Separators = Separator1 | prasna_Separator
prasna_StopSeparator = pp.LineStart() + prasna_Separator

englishPreface = pp.Combine(pp.Keyword("Notes") + englishLines).setResultsName(
    "englishPreface_Section*"
)
invocation = pp.Combine(pp.Keyword("OM") + invocationLines).setResultsName(
    "invocation_Section*"
)
sectiontitleLine = pp.Combine(
    pp.Word(pp.nums) + pp.Literal(".") + pp.Word(pp.nums)
).setResultsName("sectiontitleLine_Section*")
prasnaLines = pp.Optional(EOL) + pp.Combine(
    pp.OneOrMore(pp.SkipTo(pp.LineEnd()) + EOL, stopOn=prasna_StopSeparator)
).setResultsName("prasna_Section*")

prasna = pp.Optional(prasna_Separator + pp.SkipTo(EOL)).suppress() + prasnaLines
prasna.setName("prasna")
parser = englishPreface + invocation + titleLine + pp.OneOrMore(prasna)
if args_count := len(sys.argv) < 2:
    print("Provide a file name ")
    raise SystemExit(2)
file_name = sys.argv[1]
text = Path(file_name).read_text(encoding="utf-8")
text="""\
Notes for the users of this document

EnglishPrefaceNotes EnglishPrefaceNotes
OM invocationFirstLine invocationFirstLine
3 Title Title Title Title
3.1 SectionTitle SectionTitle
AnnexureText AnnexureText 
========================
3.2 SectionTitle SectionTitle SectionTitle
PrasnaEndLine2_2

========================
"""
#text = Path(file_name).read_text(encoding="utf-8")
texts=[text]
parser.runTests(texts)

As you can I have put the string in code as a multi-line file as well as reading it from a file and processing it .

When I process it as a string I get

Match prasna at loc 147(6,1)
  3.1 SectionTitle SectionTitle
  ^
Matched prasna -> ['3.1 SectionTitle SectionTitle\nAnnexureText AnnexureText \n']
Match prasna at loc 204(8,1)
  ========================
  ^
Matched prasna -> ['\n', '3.2 SectionTitle SectionTitle SectionTitle\nPrasnaEndLine2_2\n']
Match prasna at loc 290(12,1)
  ========================
  ^
Matched prasna -> ['\n', '']
Match prasna at loc 316(13,2)

   ^
Match prasna failed, ParseException raised: , found end of text  (at char 316), (line:13, col:2)

Notes for the users of this document

EnglishPrefaceNotes EnglishPrefaceNotes
OM invocationFirstLine invocationFirstLine
3 Title Title Title Title
3.1 SectionTitle SectionTitle
AnnexureText AnnexureText
========================
3.2 SectionTitle SectionTitle SectionTitle
PrasnaEndLine2_2

========================

['Notes for the users of this document\n\nEnglishPrefaceNotes EnglishPrefaceNotes\n', 'OM invocationFirstLine invocationFirstLine\n', '3 Title Title Title Title', '3.1 SectionTitle SectionTitle\nAnnexureText AnnexureText \n', '\n', '3.2 SectionTitle SectionTitle SectionTitle\nPrasnaEndLine2_2\n', '\n', '']
- englishPreface_Section: ['Notes for the users of this document\n\nEnglishPrefaceNotes EnglishPrefaceNotes\n']
- invocation_Section: ['OM invocationFirstLine invocationFirstLine\n']
- prasna_Section: ['3.1 SectionTitle SectionTitle\nAnnexureText AnnexureText \n', '3.2 SectionTitle SectionTitle SectionTitle\nPrasnaEndLine2_2\n', '']
- titleLine_Section: ['3 Title Title Title Title']

But when I process it as a string from the file I get

Match prasna at loc 147(6,1)
  3.1 SectionTitle SectionTitle
  ^
Matched prasna -> ['3.1 SectionTitle SectionTitle\nAnnexureText AnnexureText \n']
Match prasna at loc 204(8,1)
  ========================
  ^
Matched prasna -> ['\n', '3.2 SectionTitle SectionTitle SectionTitle\nPrasnaEndLine2_2\n']
Match prasna at loc 290(12,1)
  ========================
  ^
Match prasna failed, ParseException raised: , found end of text  (at char 315), (line:12, col:26)

Notes for the users of this document

EnglishPrefaceNotes EnglishPrefaceNotes
OM invocationFirstLine invocationFirstLine
3 Title Title Title Title
3.1 SectionTitle SectionTitle
AnnexureText AnnexureText
========================
3.2 SectionTitle SectionTitle SectionTitle
PrasnaEndLine2_2

========================
========================
^
ParseException: Expected end of text, found '='  (at char 290), (line:12, col:1)
FAIL: Expected end of text, found '='  (at char 290), (line:12, col:1)

From the setName debugging things look proper in both the cases.

I am running Python 3.9 on a Mac

python3 -V
Python 3.9.6
0

There are 0 answers