I need to parse a large (>800MB) XML file from Jython. The XML is not deeply nested, containing about a million relevant elements. I need to convert these elements into real objects.
I've used nu.xom.* successfully before, but now that I've switched from Java to Jython, the library fails with the following message:
The parser has encountered more than "64,000" entity expansions in this document; this is the limit imposed by the application.
I have not found a way to fix this, so I probably have to look for another XML library. It could be either Java or Jython-compatible Python and should be efficient. Pythonic would be great, nu.xom.* is simple but not very pythonic. Do you have any suggestions?
Does jython support
xml.etree.ElementTree? If so, use theiterparsemethod to keep your memory size down. Read this and use elem.clear() as described.