How do you parse sections of text with Lark in Python

Question

How do you parse sections of text with Lark in Python

104 views Asked by Alan W. Smith At 14 February 2023 at 04:30

I'm trying to figure out how to use the Lark Python Module to parse a document that looks like this:

---> TITLE

Introduction

---> CONTENT

The quick

Brown fox

---> TEST

Jumps over

---> CONTENT 

The lazy dog

Each ---> marks the start of a section of a specific type that has some content that goes until the next ---> section starts.

So far, I have this


from lark import Lark

parser = Lark(r"""
    start: section*
    | line*

    section.1 : "---> " SECTION_TITLE "\n\n"
    SECTION_TITLE.1 :  "TITLE" | "CONTENT" | "SOURCE" | "OUTPUT"

    line.-1: ANY_LINE
    ANY_LINE.-1: /.+\n*/

    """, start='start')

with open("src/index.mdx") as _in:
    print(parser.parse(_in.read()))

It parses the file, but everything shows up in ANY_LINE tokens instead of splitting out the section headers. I'm new to this type of parser and feel like I'm missing something obvious, but I haven't been able to figure it out.

Original Q&A

There are 1 answers

**Alan W. Smith** · Answer 1 · 2023-02-14T05:03:04+00:00

I think this is doing what I'm after. Not marking this as the answer for now in case other folks have better ideas

parser = Lark(r"""
    start: section*
    
    section : THING SECTION_TITLE line*
    THING : "--->"
    SECTION_TITLE :  "TITLE" | "CONTENT" | "SOURCE" | "OUTPUT" | "TEST"

    line: ANY_LINE
    ANY_LINE.-1: /.+\n*/

    %import common.WS
    %ignore WS

    """, start='start')

TechQA.

How do you parse sections of text with Lark in Python

There are 1 answers

Related Questions in PYTHON

Related Questions in LARK-PARSER

Popular Questions

Trending Questions