How to make python textx parse Island Grammars

92 views Asked by At

I am trying to parse a network device configuration file, and though I would be going through the whole file, I will not want to include all the terms of the file, but only a subset.

So assume that the configuration file is as follows:

bootfile abc.bin
motd "this is a
message for everyone to follow"
.
.
.
group 1
group 2
.
.
.
permit tcp ab.b.c.d b.b.b.y eq 222
permit tcp ab.b.c.d b.b.b.y eq 222
permit tcp ab.b.c.d b.b.b.y eq 222
.
.
.
interface a
  description this is interface a
  vlan 33 

interface b
  description craigs list
  no shut
  vlan 33
  no ip address
.
.
.

I am only trying to capture the interface line (as is) and the description and vlan line as is - everything else would be ignore. Contents within the interface would be broken into 2 attributes: valid and invalid

so the grammar would look something like this:

Config[noskipsp]:
  interfaces *= !InterfaceDefinition | InterfaceDefinition
;

InterfaceDefinition:
  interface = intf
  valids *= valid
  invalids *= invalid
;

intf: /^interface .*\n/;
cmds: /^ (description|vlan) .*\n/;
invalid: /^(?!(interface|description|vlan) .*\n;

The goal is to attain a python array of interfaces where each interface has 2 attributes: valids, and invalids, each are arrays. valid array would contain either description or vlan entries, and invalid would contain everything else.

There are several challenges that I can't seem to address: 1- How to ignore all the other content that is not an interface definition? 2- How to ensure that all interfaces end up as an interface and not in the invalids attribute of another interface?

Unfortunately - the grammar when parsing the text does not fail, but my understanding how the parser goes through the text appears to be at fault, since it complains the moment it tries to read any text passed the 'interface .*' section.

Additionally, currently I am testing explicitly with a file containing only interface definitions, but the goal is to process full files, targetting only the interfaces, so all other content needs from the grammar side to be able to be discarded.


Updated progress

Originally - after Igor's first answers, I was able to create a grammar that would fully parse successfully a dummy configuration file I had, though the results were not the ones desired - probably due to my ignorance. With Igor's 2nd updated answer, I have decided to refactor the original grammar and simplify it to try to match my sample dummy configuration.

My goal at the model level is to be able to have an object that would resemble something similar to the following pseudo structure

class network_config:

    def __init__(self):
        self.invalid = [] # Entries that do not match the higher level
                       # hierarchy objects
        self.interfaces = []  # Inteface definitions

class Interface:

     def __init__(self):
        self.name = ""
        self.vlans = []
        self.description = ""
        self.junk = []  # This are all other configurations
                        # within the interface that are not
                        # matching neither vlans or description

The dummy configuration file (data to be parsed) looks as follows:

junk
irrelevant configuration line
interface GigabitEthernet0/0/0
   description 3 and again
   nonsense
   vlan 33
   this and that
   vlan 16
interface GigabitEthernet0/0/2
   something for the nic
   vlan 22
   description here and there
! a simple comment
intermiediate
more nonrelated information

interface GigabitEthernet0/0/3
   this is junk too
   vlan 99
don't forget this
interface GigabitEthernet0/0/1
interface GigabitEthernet0/0/9
nothing of interest
silly stuff
some final data

And the new textx grammar that I have created is as follows:

Config:
    (
        invalid*=Junk
        | interfaces*=Interface
    )*
;

Junk:
   /(?s)(?!((interface)|(vlan)|(description)).)[^\n]*\n/  // <- consume that is not a 'vlan', 'description', nor 'interface'
;

Interface:
   'interface' name=/[^\n]+\n/
   ( description+=Description
   | vlans*=Vlan
   | invalids*=InterfaceJunk
   )*
;

Description:
    /description[^\n]+\n/
;

Vlan:
    /vlan[^\n]+\n/
;

InterfaceJunk:
    /(?!((interface)|(vlan)|(description))).[^\n]*\n/  // <- consume everything that is not an interface, vlan, or description
;

To my surprise when I tried to run against it - I noticed that it was going into an infinite loop. I also noticed that changing the root rule from

Config:
    (
        invalid*=Junk
        | interfaces*=Interface
    )*
;

*** PARSING MODEL ***
>> Matching rule Model=Sequence at position 0 => *junk irrel
   >> Matching rule Config=ZeroOrMore in Model at position 0 => *junk irrel
      >> Matching rule OrderedChoice in Config at position 0 => *junk irrel
         >> Matching rule __asgn_zeroormore=ZeroOrMore[invalid] in Config at position 0 => *junk irrel
            ?? Try match rule Junk=RegExMatch((?s)(?!((interface)|(vlan)|(description)).)[^\n]+\n) in __asgn_zeroormore at position 0 => *junk irrel
            ++ Match 'junk
' at 0 => '*junk *irrel'
            ?? Try match rule Junk=RegExMatch((?s)(?!((interface)|(vlan)|(description)).)[^\n]+\n) in __asgn_zeroormore at position 5 => junk *irrelevant
            ++ Match 'irrelevant configuration line
' at 5 => 'junk *irrelevant configuration line *'
            ?? Try match rule Junk=RegExMatch((?s)(?!((interface)|(vlan)|(description)).)[^\n]+\n) in __asgn_zeroormore at position 35 => tion line *interface
            -- NoMatch at 35
         <<+ Matched rule __asgn_zeroormore=ZeroOrMore[invalid] in __asgn_zeroormore at position 35 => tion line *interface
      <<+ Matched rule OrderedChoice in Config at position 35 => tion line *interface
      >> Matching rule OrderedChoice in Config at position 35 => tion line *interface
         >> Matching rule __asgn_zeroormore=ZeroOrMore[invalid] in Config at position 35 => tion line *interface
            ?? Try match rule Junk=RegExMatch((?s)(?!((interface)|(vlan)|(description)).)[^\n]+\n) in __asgn_zeroormore at position 35 => tion line *interface
            -- NoMatch at 35
         <<- Not matched rule __asgn_zeroormore=ZeroOrMore[invalid] in __asgn_zeroormore at position 35 => tion line *interface
      <<- Not matched rule OrderedChoice in Config at position 35 => tion line *interface
      >> Matching rule OrderedChoice in Config at position 35 => tion line *interface
         >> Matching rule __asgn_zeroormore=ZeroOrMore[invalid] in Config at position 35 => tion line *interface
            ?? Try match rule Junk=RegExMatch((?s)(?!((interface)|(vlan)|(description)).)[^\n]+\n) in __asgn_zeroormore at position 35 => tion line *interface
            -- NoMatch at 35
         <<- Not matched rule __asgn_zeroormore=ZeroOrMore[invalid] in __asgn_zeroormore at position 35 => tion line *interface

to

Config:
    (
        invalid*=Junk interfaces*=Interface
    )*
;

*** PARSING MODEL ***
>> Matching rule Model=Sequence at position 0 => *junk irrel
   >> Matching rule Config=ZeroOrMore in Model at position 0 => *junk irrel
      >> Matching rule Sequence in Config at position 0 => *junk irrel
         >> Matching rule __asgn_zeroormore=ZeroOrMore[invalid] in Config at position 0 => *junk irrel
            ?? Try match rule Junk=RegExMatch((?s)(?!((interface)|(vlan)|(description)).)[^\n]+\n) in __asgn_zeroormore at position 0 => *junk irrel
            ++ Match 'junk
' at 0 => '*junk *irrel'
            ?? Try match rule Junk=RegExMatch((?s)(?!((interface)|(vlan)|(description)).)[^\n]+\n) in __asgn_zeroormore at position 5 => junk *irrelevant
            ++ Match 'irrelevant configuration line
' at 5 => 'junk *irrelevant configuration line *'
            ?? Try match rule Junk=RegExMatch((?s)(?!((interface)|(vlan)|(description)).)[^\n]+\n) in __asgn_zeroormore at position 35 => tion line *interface
            -- NoMatch at 35
         <<+ Matched rule __asgn_zeroormore=ZeroOrMore[invalid] in __asgn_zeroormore at position 35 => tion line *interface
         >> Matching rule __asgn_zeroormore=ZeroOrMore[interfaces] in Config at position 35 => tion line *interface
            >> Matching rule Interface=Sequence in __asgn_zeroormore at position 35 => tion line *interface
               ?? Try match rule StrMatch(interface) in Interface at position 35 => tion line *interface
               ++ Match 'interface' at 35 => 'tion line *interface* '
               >> Matching rule __asgn_plain=Sequence[name] in Interface at position 44 =>  interface* GigabitEt
                  ?? Try match rule RegExMatch([^\n]+\n) in __asgn_plain at position 45 => interface *GigabitEth
                  ++ Match 'GigabitEthernet0/0/0
' at 45 => 'interface *GigabitEthernet0/0/0 *'
               <<+ Matched rule __asgn_plain=Sequence[name] in __asgn_plain at position 66 => rnet0/0/0 *   descrip
               >> Matching rule ZeroOrMore in Interface at position 66 => rnet0/0/0 *   descrip
                  >> Matching rule OrderedChoice in Interface at position 66 => rnet0/0/0 *   descrip
                     >> Matching rule __asgn_oneormore=OneOrMore[description] in Interface at position 66 => rnet0/0/0 *   descrip
                        ?? Try match rule Description=RegExMatch(description[^\n]+\n) in __asgn_oneormore at position 69 => t0/0/0    *descriptio
                        ++ Match 'description 3 and again
' at 69 => 't0/0/0    *description 3 and again *'
                        ?? Try match rule Description=RegExMatch(description[^\n]+\n) in __asgn_oneormore at position 96 =>  again    *nonsense
                        -- NoMatch at 96

Gave 2 different results, though none of them were as I was hoping for - in the first format, the parser would end up stuck looping looking continuously for the invalid patterns (i.e.; Junk), while in the second format, the parser would be able to get passed seeking for invalids, and at least find the first interface GigabitEthernet0/0/0 though once inside the interface it would, once more, get into an infinite loop.

I was under the impression that doing a ( attr1*=pattern1 | attr2*=pattern2 | attr3*=pattern3) meant that it would try each of the patterns, but it seems to be stuck on pattern1 for as long as pattern1 is not being found. (Ordered choice describes it as such) - I must have something in the grammar that is causing this.

I then proceeded to update the grammar parser to the following - which appears to get me a bit further and rid of the infinite loop, but somehow when looking at the debugging information - it seems that when it exhausts the conditions within a rule, it back tracks the text that it was situated at...

Config:
    (
        (
            invalid*=Junk
            | interfaces*=Interface
        )#
    )*
;

Junk:
   /(?!((interface)|(vlan)|(description)).)[^\n]*\n/  // <- consume everything till the 'vlan', interface, or description
;

Interface[noskipws]:
   'interface'/\s*/ name=/[^\n]+\n/
   (
       ( description+=Description
       | vlans*=Vlan
       | invalids*=InterfaceJunk
       )#  // How does this get out from here - how does textx know to get out (if all 3 possibilities are not matched?)
   )*
;

Description[noskipws]:
    /\s+description[^\n]+\n/
;

Vlan[noskipws]:
    /\s+vlan[^\n]+\n/
;

InterfaceJunk[noskipws]:
    /(?!((interface)|(\s+vlan)|(\s+description)).)[^\n]*\n/  // <- consume everything till the 'vlan', interface, or description
;

Log now looks like:

*** PARSING MODEL ***
>> Matching rule Model=Sequence at position 0 => *junk irrel
   >> Matching rule Config=ZeroOrMore in Model at position 0 => *junk irrel
      >> Matching rule UnorderedGroup in Config at position 0 => *junk irrel
         >> Matching rule __asgn_zeroormore=ZeroOrMore[invalid] in Config at position 0 => *junk irrel
            ?? Try match rule Junk=RegExMatch((?!((interface)|(vlan)|(description)).)[^\n]*\n) in __asgn_zeroormore at position 0 => *junk irrel
            ++ Match 'junk
' at 0 => '*junk *irrel'
            ?? Try match rule Junk=RegExMatch((?!((interface)|(vlan)|(description)).)[^\n]*\n) in __asgn_zeroormore at position 5 => junk *irrelevant
            ++ Match 'irrelevant configuration line
' at 5 => 'junk *irrelevant configuration line *'
            ?? Try match rule Junk=RegExMatch((?!((interface)|(vlan)|(description)).)[^\n]*\n) in __asgn_zeroormore at position 35 => tion line *interface
            -- NoMatch at 35
         <<+ Matched rule __asgn_zeroormore=ZeroOrMore[invalid] in __asgn_zeroormore at position 35 => tion line *interface
         >> Matching rule __asgn_zeroormore=ZeroOrMore[interfaces] in Config at position 35 => tion line *interface
            >> Matching rule Interface=Sequence in __asgn_zeroormore at position 35 => tion line *interface
               ?? Try match rule StrMatch(interface) in Interface at position 35 => tion line *interface
               ++ Match 'interface' at 35 => 'tion line *interface* '
               ?? Try match rule RegExMatch(\s*) in Interface at position 44 =>  interface* GigabitEt
               ++ Match ' ' at 44 => ' interface* *GigabitEt'
               >> Matching rule __asgn_plain=Sequence[name] in Interface at position 45 => interface *GigabitEth
                  ?? Try match rule RegExMatch([^\n]+\n) in __asgn_plain at position 45 => interface *GigabitEth
                  ++ Match 'GigabitEthernet0/0/0
' at 45 => 'interface *GigabitEthernet0/0/0 *'
               <<+ Matched rule __asgn_plain=Sequence[name] in __asgn_plain at position 66 => rnet0/0/0 *   nonsens
               >> Matching rule ZeroOrMore in Interface at position 66 => rnet0/0/0 *   nonsens
                  >> Matching rule UnorderedGroup in Interface at position 66 => rnet0/0/0 *   nonsens
                     >> Matching rule __asgn_oneormore=OneOrMore[description] in Interface at position 66 => rnet0/0/0 *   nonsens
                        >> Matching rule Description=Sequence in __asgn_oneormore at position 66 => rnet0/0/0 *   nonsens
                           ?? Try match rule RegExMatch(\s+description[^\n]+\n) in Description at position 66 => rnet0/0/0 *   nonsens
                           -- NoMatch at 66
                        <<- Not matched rule Description=Sequence in Description at position 66 => rnet0/0/0 *   nonsens
                     <<- Not matched rule __asgn_oneormore=OneOrMore[description] in __asgn_oneormore at position 66 => rnet0/0/0 *   nonsens
                     >> Matching rule __asgn_zeroormore=ZeroOrMore[vlans] in Interface at position 66 => rnet0/0/0 *   nonsens
                        >> Matching rule Vlan=Sequence in __asgn_zeroormore at position 66 => rnet0/0/0 *   nonsens
                           ?? Try match rule RegExMatch(\s+vlan[^\n]+\n) in Vlan at position 66 => rnet0/0/0 *   nonsens
                           -- NoMatch at 66
                        <<- Not matched rule Vlan=Sequence in Vlan at position 66 => rnet0/0/0 *   nonsens
                     <<- Not matched rule __asgn_zeroormore=ZeroOrMore[vlans] in __asgn_zeroormore at position 66 => rnet0/0/0 *   nonsens
                     >> Matching rule __asgn_zeroormore=ZeroOrMore[invalids] in Interface at position 66 => rnet0/0/0 *   nonsens
                        >> Matching rule InterfaceJunk=Sequence in __asgn_zeroormore at position 66 => rnet0/0/0 *   nonsens
                           ?? Try match rule RegExMatch((?!((interface)|(\s+vlan)|(\s+description)).)[^\n]*\n) in InterfaceJunk at position 66 => rnet0/0/0 *   nonsens
                           ++ Match '   nonsense
' at 66 => 'rnet0/0/0 *   nonsense *'
                        <<+ Matched rule InterfaceJunk=Sequence in InterfaceJunk at position 78 =>  nonsense *   vlan 33
                        >> Matching rule InterfaceJunk=Sequence in __asgn_zeroormore at position 78 =>  nonsense *   vlan 33
                           ?? Try match rule RegExMatch((?!((interface)|(\s+vlan)|(\s+description)).)[^\n]*\n) in InterfaceJunk at position 78 =>  nonsense *   vlan 33
                           -- NoMatch at 78
                        <<- Not matched rule InterfaceJunk=Sequence in InterfaceJunk at position 78 =>  nonsense *   vlan 33
                     <<+ Matched rule __asgn_zeroormore=ZeroOrMore[invalids] in __asgn_zeroormore at position 78 =>  nonsense *   vlan 33
                     >> Matching rule __asgn_oneormore=OneOrMore[description] in Interface at position 78 =>  nonsense *   vlan 33
                        >> Matching rule Description=Sequence in __asgn_oneormore at position 78 =>  nonsense *   vlan 33
                           ?? Try match rule RegExMatch(\s+description[^\n]+\n) in Description at position 78 =>  nonsense *   vlan 33
                           -- NoMatch at 78
                        <<- Not matched rule Description=Sequence in Description at position 78 =>  nonsense *   vlan 33
                     <<- Not matched rule __asgn_oneormore=OneOrMore[description] in __asgn_oneormore at position 78 =>  nonsense *   vlan 33
                     >> Matching rule __asgn_zeroormore=ZeroOrMore[vlans] in Interface at position 78 =>  nonsense *   vlan 33
                        >> Matching rule Vlan=Sequence in __asgn_zeroormore at position 78 =>  nonsense *   vlan 33
                           ?? Try match rule RegExMatch(\s+vlan[^\n]+\n) in Vlan at position 78 =>  nonsense *   vlan 33
                           ++ Match '   vlan 33
' at 78 => ' nonsense *   vlan 33 *'
                        <<+ Matched rule Vlan=Sequence in Vlan at position 89 =>   vlan 33 *   descrip
                        >> Matching rule Vlan=Sequence in __asgn_zeroormore at position 89 =>   vlan 33 *   descrip
                           ?? Try match rule RegExMatch(\s+vlan[^\n]+\n) in Vlan at position 89 =>   vlan 33 *   descrip
                           -- NoMatch at 89
                        <<- Not matched rule Vlan=Sequence in Vlan at position 89 =>   vlan 33 *   descrip
                     <<+ Matched rule __asgn_zeroormore=ZeroOrMore[vlans] in __asgn_zeroormore at position 89 =>   vlan 33 *   descrip
                     >> Matching rule __asgn_oneormore=OneOrMore[description] in Interface at position 89 =>   vlan 33 *   descrip
                        >> Matching rule Description=Sequence in __asgn_oneormore at position 89 =>   vlan 33 *   descrip
                           ?? Try match rule RegExMatch(\s+description[^\n]+\n) in Description at position 89 =>   vlan 33 *   descrip
                           ++ Match '   description 3 and again
' at 89 => '  vlan 33 *   description 3 and again *'
                        <<+ Matched rule Description=Sequence in Description at position 116 => and again *   this an
                        >> Matching rule Description=Sequence in __asgn_oneormore at position 116 => and again *   this an
                           ?? Try match rule RegExMatch(\s+description[^\n]+\n) in Description at position 116 => and again *   this an
                           -- NoMatch at 116
                        <<- Not matched rule Description=Sequence in Description at position 116 => and again *   this an
                     <<+ Matched rule __asgn_oneormore=OneOrMore[description] in __asgn_oneormore at position 116 => and again *   this an
                  <<+ Matched rule UnorderedGroup in Interface at position 116 => and again *   this an
                  >> Matching rule UnorderedGroup in Interface at position 116 => and again *   this an
                     >> Matching rule __asgn_oneormore=OneOrMore[description] in Interface at position 116 => and again *   this an
                        >> Matching rule Description=Sequence in __asgn_oneormore at position 116 => and again *   this an
                           ?? Try match rule RegExMatch(\s+description[^\n]+\n) in Description at position 116 => and again *   this an
                           -- NoMatch at 116
                        <<- Not matched rule Description=Sequence in Description at position 116 => and again *   this an
                     <<- Not matched rule __asgn_oneormore=OneOrMore[description] in __asgn_oneormore at position 116 => and again *   this an
                     >> Matching rule __asgn_zeroormore=ZeroOrMore[vlans] in Interface at position 116 => and again *   this an
                        >> Matching rule Vlan=Sequence in __asgn_zeroormore at position 116 => and again *   this an
                           ?? Try match rule RegExMatch(\s+vlan[^\n]+\n) in Vlan at position 116 => and again *   this an
                           -- NoMatch at 116
                        <<- Not matched rule Vlan=Sequence in Vlan at position 116 => and again *   this an
                     <<- Not matched rule __asgn_zeroormore=ZeroOrMore[vlans] in __asgn_zeroormore at position 116 => and again *   this an
                     >> Matching rule __asgn_zeroormore=ZeroOrMore[invalids] in Interface at position 116 => and again *   this an
                        >> Matching rule InterfaceJunk=Sequence in __asgn_zeroormore at position 116 => and again *   this an
                           ?? Try match rule RegExMatch((?!((interface)|(\s+vlan)|(\s+description)).)[^\n]*\n) in InterfaceJunk at position 116 => and again *   this an
                           ++ Match '   this and that
' at 116 => 'and again *   this and that *'
                        <<+ Matched rule InterfaceJunk=Sequence in InterfaceJunk at position 133 =>  and that *   vlan 16
                        >> Matching rule InterfaceJunk=Sequence in __asgn_zeroormore at position 133 =>  and that *   vlan 16
                           ?? Try match rule RegExMatch((?!((interface)|(\s+vlan)|(\s+description)).)[^\n]*\n) in InterfaceJunk at position 133 =>  and that *   vlan 16
                           -- NoMatch at 133
                        <<- Not matched rule InterfaceJunk=Sequence in InterfaceJunk at position 133 =>  and that *   vlan 16
                     <<+ Matched rule __asgn_zeroormore=ZeroOrMore[invalids] in __asgn_zeroormore at position 133 =>  and that *   vlan 16
                     >> Matching rule __asgn_oneormore=OneOrMore[description] in Interface at position 133 =>  and that *   vlan 16
                        >> Matching rule Description=Sequence in __asgn_oneormore at position 133 =>  and that *   vlan 16
                           ?? Try match rule RegExMatch(\s+description[^\n]+\n) in Description at position 133 =>  and that *   vlan 16
                           -- NoMatch at 133
                        <<- Not matched rule Description=Sequence in Description at position 133 =>  and that *   vlan 16
                     <<- Not matched rule __asgn_oneormore=OneOrMore[description] in __asgn_oneormore at position 133 =>  and that *   vlan 16
                     >> Matching rule __asgn_zeroormore=ZeroOrMore[vlans] in Interface at position 133 =>  and that *   vlan 16
                        >> Matching rule Vlan=Sequence in __asgn_zeroormore at position 133 =>  and that *   vlan 16
                           ?? Try match rule RegExMatch(\s+vlan[^\n]+\n) in Vlan at position 133 =>  and that *   vlan 16
                           ++ Match '   vlan 16
' at 133 => ' and that *   vlan 16 *'
                        <<+ Matched rule Vlan=Sequence in Vlan at position 144 =>   vlan 16 *interface
                        >> Matching rule Vlan=Sequence in __asgn_zeroormore at position 144 =>   vlan 16 *interface
                           ?? Try match rule RegExMatch(\s+vlan[^\n]+\n) in Vlan at position 144 =>   vlan 16 *interface
                           -- NoMatch at 144
                        <<- Not matched rule Vlan=Sequence in Vlan at position 144 =>   vlan 16 *interface
                     <<+ Matched rule __asgn_zeroormore=ZeroOrMore[vlans] in __asgn_zeroormore at position 144 =>   vlan 16 *interface
                     >> Matching rule __asgn_oneormore=OneOrMore[description] in Interface at position 144 =>   vlan 16 *interface
                        >> Matching rule Description=Sequence in __asgn_oneormore at position 144 =>   vlan 16 *interface
                           ?? Try match rule RegExMatch(\s+description[^\n]+\n) in Description at position 144 =>   vlan 16 *interface
                           -- NoMatch at 144
                        <<- Not matched rule Description=Sequence in Description at position 144 =>   vlan 16 *interface
                     <<- Not matched rule __asgn_oneormore=OneOrMore[description] in __asgn_oneormore at position 144 =>   vlan 16 *interface
                  <<- Not matched rule UnorderedGroup in Interface at position 116 => and again *   this an
               <<+ Matched rule ZeroOrMore in Interface at position 116 => and again *   this an
            <<+ Matched rule Interface=Sequence in Interface at position 116 => and again *   this an
            >> Matching rule Interface=Sequence in __asgn_zeroormore at position 116 => and again *   this an
               ?? Try match rule StrMatch(interface) in Interface at position 116 => and again *   this an
               -- No match 'interface' at 116 => 'and again *   this a*n'
            <<- Not matched rule Interface=Sequence in Interface at position 116 => and again *   this an
         <<+ Matched rule __asgn_zeroormore=ZeroOrMore[interfaces] in __asgn_zeroormore at position 116 => and again *   this an
      <<+ Matched rule UnorderedGroup in Config at position 116 => and again *   this an
      >> Matching rule UnorderedGroup in Config at position 116 => and again *   this an
         >> Matching rule __asgn_zeroormore=ZeroOrMore[invalid] in Config at position 116 => and again *   this an
            ?? Try match rule Junk=RegExMatch((?!((interface)|(vlan)|(description)).)[^\n]*\n) in __asgn_zeroormore at position 119 =>  again    *this and t
            ++ Match 'this and that
' at 119 => ' again    *this and that *'
            ?? Try match rule Junk=RegExMatch((?!((interface)|(vlan)|(description)).)[^\n]*\n) in __asgn_zeroormore at position 136 => d that    *vlan 16 in
            -- NoMatch at 136
         <<+ Matched rule __asgn_zeroormore=ZeroOrMore[invalid] in __asgn_zeroormore at position 133 =>  and that *   vlan 16
         >> Matching rule __asgn_zeroormore=ZeroOrMore[interfaces] in Config at position 133 =>  and that *   vlan 16
            >> Matching rule Interface=Sequence in __asgn_zeroormore at position 133 =>  and that *   vlan 16
               ?? Try match rule StrMatch(interface) in Interface at position 133 =>  and that *   vlan 16
               -- No match 'interface' at 133 => ' and that *   vlan 1*6'
            <<- Not matched rule Interface=Sequence in Interface at position 133 =>  and that *   vlan 16
         <<- Not matched rule __asgn_zeroormore=ZeroOrMore[interfaces] in __asgn_zeroormore at position 133 =>  and that *   vlan 16
      <<+ Matched rule UnorderedGroup in Config at position 133 =>  and that *   vlan 16
      >> Matching rule UnorderedGroup in Config at position 133 =>  and that *   vlan 16
         >> Matching rule __asgn_zeroormore=ZeroOrMore[invalid] in Config at position 133 =>  and that *   vlan 16
            ?? Try match rule Junk=RegExMatch((?!((interface)|(vlan)|(description)).)[^\n]*\n) in __asgn_zeroormore at position 136 => d that    *vlan 16 in
            -- NoMatch at 136
         <<- Not matched rule __asgn_zeroormore=ZeroOrMore[invalid] in __asgn_zeroormore at position 133 =>  and that *   vlan 16
         >> Matching rule __asgn_zeroormore=ZeroOrMore[interfaces] in Config at position 133 =>  and that *   vlan 16
            >> Matching rule Interface=Sequence in __asgn_zeroormore at position 133 =>  and that *   vlan 16
               ?? Try match rule StrMatch(interface) in Interface at position 133 =>  and that *   vlan 16
               -- No match 'interface' at 133 => ' and that *   vlan 1*6'
            <<- Not matched rule Interface=Sequence in Interface at position 133 =>  and that *   vlan 16
         <<- Not matched rule __asgn_zeroormore=ZeroOrMore[interfaces] in __asgn_zeroormore at position 133 =>  and that *   vlan 16
      <<- Not matched rule UnorderedGroup in Config at position 133 =>  and that *   vlan 16
   <<+ Matched rule Config=ZeroOrMore in Config at position 133 =>  and that *   vlan 16
   ?? Try match rule EOF in Model at position 136 => d that    *vlan 16 in
   !! EOF not matched.
<<- Not matched rule Model=Sequence in Model at position 0 => *junk irrel
Traceback (most recent call last):
...

Any hints as to where are my misconceptions?


Revised and working grammar

After Igor's hints (thank you), I have been able to make the final grammar.tx which is successfully parsing and yields the desired object results in the model generated by textx (see final answer)

2

There are 2 answers

6
Igor Dejanović On BEST ANSWER

What you are doing is usually called Island Grammars. You can easily do that in textX and you can easily extract the actual structure of the interfaces. Here is one possible solution:

from textx import metamodel_from_str

mmstr = r'''
Config:
    (
        /(?s)((?!interface).)*/   // <- consume everything till the keyword 'interface'
        interfaces=Interface
    )*
    /(?s).*/   // <- consume all content after the last interface
;

Interface:
    'interface' name=ID
    'description' description=/[^\n]*/
    /((?!vlan).)*/  // <- consume everything till the 'vlan'
    'vlan' vlan=INT;
'''

model_str = r'''
bootfile abc.bin
motd "this is a
message for everyone to follow"
.
.
.
group 1
group 2
.
.
.
permit tcp ab.b.c.d b.b.b.y eq 222
permit tcp ab.b.c.d b.b.b.y eq 222
permit tcp ab.b.c.d b.b.b.y eq 222
.
.
.
interface a
  description this is interface a
  vlan 33

interface b
  description craigs list
  no shut
  vlan 33
  no ip address
'''

mm = metamodel_from_str(mmstr)

model = mm.model_from_str(model_str)

for i in model.interfaces:
    print(i.name, i.description, i.vlan)

You can always put textX in debug mode (by passing debug=True) to see the parsing process and validate your assumptions.

Addition - interface elements in arbitrary order

Interface:
    'interface' name=ID
    ('description' description=/[^\n]*/
     | 'vlan' vlan=INT
     | /(?s)(?!interface)./    // <- Consume a single char if not description or vlan or interface
    )*        // <- and then repeat

This will work for description and vlan at any position, but note that attributes on resulting Python object will now be lists as this grammar supports multiple vlan and description per interface. You can fix this by registering object processor for Interface and extracting the only element of the list. Something like:

def interface_processor(obj):
    if obj.description:
        obj.description = obj.description[0]
    if obj.vlan:
        obj.vlan = obj.vlan[0]

Don't forget to register this object processor. See textX docs for the details.

0
user190270 On

Final working grammar

After applying Igor's suggestion this grammar succeeds at parsing and generating the desired object.

Config:
    (
            Junk
            | interfaces=Interface
    )*
;

Junk:
   /(?!((interface)|(vlan)|(description)).)[^\n]*\n/
;

Interface[noskipws]:
   'interface'/\s*/ name=/[^\n]+\n/
   (
       description+=Description
       | vlans=Vlan
       | invalids=InterfaceJunk
   )*
;

Description[noskipws]:
    /\s+description[^\n]+\n/
;

Vlan[noskipws]:
    /\s+vlan[^\n]+\n/
;

InterfaceJunk[noskipws]:
    /(?!((interface)|(\s+vlan)|(\s+description)).)[^\n]*\n/
;

With the following changes: