Parsing XML-type file with LPeg re module

257 views Asked by At

I'm trying to learn LPeg's re module and it has been quite an interesting experience, specially since the official documentation is so nice.

However there are some topics that seem to be poorly explaned there. For example the named group capture construction: {:name: p :}.

Consider the following example, I don't understand why it does not match:

print(re.compile
  [[item <- ('<' {:tag: %w+!%w :} '>' item+ '</' =tag '>') / %w+!%w]]
  :match[[<person><name>James</name><address>Earth</address></person>]])

-- outputs nil

Can anyone help me understand what is going wrong here? I thought quite a bit about that, and it really seems like I'm missing something important.

1

There are 1 answers

0
wqw On BEST ANSWER

This is a late answer but you can try following pattern

result = re.compile[[
  item <- ({| %s* '<' {:tag: %w+ :} %s* '>' (item / %s* { (!(%s* '<') .)+ }) %s* '</' =tag '>' |})+
]]:match[[
<person>
    <name>
    James
    </name>
    <address>Earth</address>
</person>
]]

which uses tables captures to parse XML w/ whitespace for elements texts stripped

tag = "person"
[1] = {
  tag = "name"
  [1] = "James"
}
[2] = {
  tag = "address"
  [1] = "Earth"
}