The xml-conduit documentation only lists examples where the entire XML tree is consumed by a ConduitM
, for example:
<people>
<person age="25">Michael</person>
<person age="2">Eliezer</person>
</people>
I'm trying to parse a tree where besides the <person>
tags from above, there are deeply nested subtrees I'm not interested in (their exact schema might even be unknown), for example:
<people>
<person age="25">Michael</person>
<tagImNotInterestedIn><!-- deeply nested complex subtree --></tagImNotInterestedIn>
<person age="2">Eliezer</person>
</people>
When parsing with the people.hs
example from the docs, I get the following exception:
people.hs: XmlException {xmlErrorMessage = "Expected end tag for: Name {nameLocalName = \"people\", nameNamespace = Nothing, namePrefix = Nothing}", xmlBadInput = Just (EventBeginElement (Name {nameLocalName = "tagImNotInterestedIn", nameNamespace = Nothing, namePrefix = Nothing}) [])}
Basically, I'm looking for a way to ignore any tag (including all its children and attributes) except specific ones I specify parsers for. When using DOM-based parsers like HXT, this is obviously easy, but the tag
docs explicitly states that it will fail unless all children are consumed.
The only hypothetical way I can think of accomplishing this is to use functions from Control.Exception
to build up a Conduit with a Maybe a
result (returning Nothing
on exception) and then use orE
to combine it with the parsers itself
Although it has been stated that the xml-conduit API needs some updating, I think there has to be a less-hackish way to ignore an entire subtree. Any ideas will be appreciated!
Since 1.5.0
Text.XML.Stream.Parse
provides a function takeTree, which probably could be used for the purpose.The code above is based on xml-conduit sample. Only
parsePeople
is changed.