The MDN HTML element reference pages list info about each element including:
- content categories
- permitted content
- tag omission
Is there a structured text file from which this is derived? Or that contains equivalent info?
I found an appendix in the HTML 5 spec which lists categories and children (except for palpable content according to a well-hidden comment) and a later table collects some of the palpable stuff. That's great but I'd rather not try and scrape data meant for human consumption.
I ask because I've got some code that maintains ElementContainmentRelationships that were hand derived from an older version of the specification.
I'd like to be able to more easily track the specification, so ideally some HTML equivalent of the UCD Property Files -- tabular data meant for machine processing.
I understand that the HTML5 chapter on parsing has lots of caveats and special cases, but I'm looking for something that is mostly correct and tracks the specification.
The MDN references are hand-crafted. The CSS data was recently converted to a machine-readable format (see https://github.com/mdn/data) and the team is willing to provide more of MDN's data in such a format, but they have limited resources, so I wouldn't get my hopes up.
I'm not sure if you're aware that the Firefox/Gecko implementation of the HTML parser is translated from a Java implementation (The Validator.nu HTML parser), it may be of help, although a quick look didn't find any tables like what you're looking for.
There are also RNG schemas for HTML5 available as part of the same validator project.