xml-conduit: How to Modify a Document?

180 views Asked by At

The xml-conduit tutorial (the only one in existence, and perhaps the only Haskell XML library with a tutorial) shows how to create or read an XML document, but not how to modify one. The only way I am familiar with such operations is using lxml/elementtree (python), which only works through side-effect (that I'm aware of). I suspect a very different approach here.

Say that I have a simple document:

<html>
    <head>
        <title>My <b>Title</b></title>
    </head>
    <body>
        <p>Paragraph 1.</p>
        <p>Paragraph 2.</p>
    </body>
</html>

How to:
- Modify the title?
- Delete the first paragraph in this document?
- Append the body of this document to the body of another document?

Feel free to propose and contribute a solution using other Haskell libraries. The community could use many more examples.

2

There are 2 answers

3
Bjartur Thorlacius On

By reading the XML document and writing a new one, keeping the similarities you want but differing in the respects you desire.

Say you have a document:: Document. If you prefer record syntax over lenses, you might wind up with a solution that looks somewhat like the following. To be fair, refactoring it into small functions with descriptive names can make somewhat more readable. Alternatively, you can use lenses, a library of small, generic functions with undescript names that are useful for exactly this kind of DOM tree manipulations.

document{ documentRoot=
    (documentRoot document){ elementNodes=
        (documentRoot document
        & elementNodes
        & (\[head,NodeElement body]->
            [head,NodeElement body{elementNodes=
                [elementNodes body & last]
        }]))
    })
}
0
dabingsou On

Another method.

from simplified_scrapy import SimplifiedDoc 
html = '''<html>
    <head>
        <title>My <b>Title</b></title>
    </head>
    <body>
        <p>Paragraph 1.</p>
        <p>Paragraph 2.</p>
    </body>
</html>'''
doc = SimplifiedDoc(html)
title = doc.title
title.setContent('Modify <b>Title</b>')
firstP = doc.body.p
firstP.repleaceSelf("")
p = doc.p
p.insertAfter(p.outerHtml)
print (doc.html)

Result:

<html>
    <head>
        <title>Modify <b>Title</b></title>
    </head>
    <body>

        <p>Paragraph 2.</p><p>Paragraph 2.</p>
    </body>
</html>

Here are more examples: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples