I'm looking for a library to help me parse and transform DTDs using Python. The only thing I have found so far is xmlproc, but that seems ancient and doesn't seem to support serialization of DTDs. There's this for Java but I'd prefer a Python solution.
Edit: by "serialization" of DTDs I mean that ideally I'd like to be able to parse the DTD to some kind of Python structure, operate on that structure and then write out the result back to a DTD.
I don't know of an end-to-end processor for DTDs, but then again I so rarely use DTDs at all so that's not surprising.
Amara can parse DTDs, but I don't know what level of access you can have to them or if the results can be serialized. I assume they can, but that's not based in reality. libxml2, which is available in Python as lxml is something else to investigate, but I have even less experience with that. It seems from the libxml documentation that you would have access to the full DTD.
Another possibility is to convert the DTD to XSD with one of many programs then use a regular XML processor to manipulate the tree, and return it back to DTD. I worry about how lossy that might be.
At an increasing level of difficulty, if you're going to write a parser yourself for the DTD grammar, consider PyParsing or PLY.