I have an XML file
<?xml version="1.0" encoding="UTF-8"?>
<?foo class="abc" options="bar,baz"?>
<document>
...
</document>
and I'm interested in the processing instruction foo and its attributes.
I can use ET.iterparse for reading the PI, but it escapes me how to access the attributes as a dictionary – .attrib only gives an empty dict.
import xml.etree.ElementTree as ET
for _, elem in ET.iterparse("data.xml", events=("pi",)):
print(repr(elem.tag))
print(repr(elem.text))
print(elem.attrib)
<function ProcessingInstruction at 0x7f848f2f7ba0>
'foo class="abc" options="bar,baz"'
{}
Any hints?
While the contents of the PI look rather like attributes, this is just a convention that the author of this document has adopted, it's not something defined by the XML spec and therefore it's not something supported in data models like DOM and XDM. They are sometimes called "pseudo-attributes".
You'll either have to parse them yourself by hand, or find a library that does it for you. Saxon has an XPath extension function
saxon:get-pseudo-attribute(); other libraries may have something similar.