I need to parse a large (>800MB) XML file from Jython. The XML is not deeply nested, containing about a million relevant elements. I need to convert these elements into real objects.
I've used nu.xom.*
successfully before, but now that I've switched from Java to Jython, the library fails with the following message:
The parser has encountered more than "64,000" entity expansions in this document; this is the limit imposed by the application.
I have not found a way to fix this, so I probably have to look for another XML library. It could be either Java or Jython-compatible Python and should be efficient. Pythonic would be great, nu.xom.*
is simple but not very pythonic. Do you have any suggestions?
Does jython support
xml.etree.ElementTree
? If so, use theiterparse
method to keep your memory size down. Read this and use elem.clear() as described.