How to prevent expat from automatically substituting entities?

165 views Asked by At

Say I have an expat parser instantiated like so:

def on_character_data(data):
    print(data)

parser = xml.parsers.expat.ParserCreate(encoding=encoding)
...
parser.CharacterDataHandler = on_character_data
...

And an XML document like so:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  </head>
<body>
  ampersands &amp; other annoyances
</body>
</html>

If I call parser.Parse(test_xml_string) The handler on_character_data() will receive the string ampersands &amp; other annoyances as ampersands & other annoyances with the &amp; replaced with &. I want expat to ignore these entities, so that on_character_data() will receive the unmodified ampersands &amp; other annoyances. Is there any way I can do this?

0

There are 0 answers