Read XML processing instrutions with attributes

Question

Read XML processing instrutions with attributes

115 views Asked by Nico Schlömer At 12 May 2023 at 13:21

I have an XML file

<?xml version="1.0" encoding="UTF-8"?>
<?foo class="abc" options="bar,baz"?>
<document>
 ...
</document>

and I'm interested in the processing instruction foo and its attributes.

I can use ET.iterparse for reading the PI, but it escapes me how to access the attributes as a dictionary – .attrib only gives an empty dict.

import xml.etree.ElementTree as ET

for _, elem in ET.iterparse("data.xml", events=("pi",)):
    print(repr(elem.tag))
    print(repr(elem.text))
    print(elem.attrib)

<function ProcessingInstruction at 0x7f848f2f7ba0>
'foo class="abc" options="bar,baz"'
{}

Any hints?

Original Q&A

There are 3 answers

**Michael Kay** · Answer 1 · 2023-05-12T14:40:02+00:00

While the contents of the PI look rather like attributes, this is just a convention that the author of this document has adopted, it's not something defined by the XML spec and therefore it's not something supported in data models like DOM and XDM. They are sometimes called "pseudo-attributes".

You'll either have to parse them yourself by hand, or find a library that does it for you. Saxon has an XPath extension function saxon:get-pseudo-attribute(); other libraries may have something similar.

**LMC** · Answer 2 · 2023-05-12T14:55:51+00:00

Using python lxml module to read PI content, create an element as string and parsing it

>>> from lxml import etree
>>> tree = etree.parse("tmp.xml")
>>> pi = tree.xpath('//processing-instruction("foo")')
>>> pi[0].text
'class="abc" options="bar,baz"'
>>> root = etree.fromstring(f"<root {pi[0].text}/>")
>>> root.get('options')
'bar,baz'

Note: ElementTree skips processing instructions

**Nico Schlömer** · Answer 3 · 2023-05-13T08:57:58+00:00

The string content of the processing instructions can theoretically be anything. In many cases though, it looks like an HTML element with attributes. To parse, one can construct an element as a string from it and parse that, e.g.:

import xml.etree.ElementTree as ET

for _, elem in ET.iterparse("data.xml", events=("pi",)):
    _elem = ET.fromstring(f"<{elem.text}/>")
    _elem.tag
    _elem.attrib

TechQA.

Read XML processing instrutions with attributes

There are 3 answers

Related Questions in PYTHON

Related Questions in XML

Related Questions in PROCESSING-INSTRUCTION

Popular Questions

Trending Questions