I need to validate XML with dynamic attribute names, like data-*
. Now I'm using RelaxNG schema, but it does not supports dynamic attribute names. What are the options? I cannot find anything relevant..
Example of XML:
<?xml version="1.0" encoding="utf-8"?>
<body xml:lang="cs" ns="www.x.y">
<h id="x" ctime="2017-09">Heading..</h>
<desc kw="kw">Desc..</desc>
<section>
<h data-foo="bar" id="one" short="One">First heading</h>
<desc>Desc...</desc>
<p>Content..</p>
<ul data-buz="fuz">
<li data-switch="click">list item</li>
<li>list item 2</li>
</ul>
</section>
</body>
Preprocess the XML to drop the
data-*
attributes before giving it to the validation function. There is otherwise no way I know to validate it with RelaxNG or other grammar-based schema languages.As far as preprocessing the XML, one way to do that with an existing XML toolchain would be: run it through an XSLT transformation that drops the
data-*
attributes but passes on all else as-is:The
<xsl:template match="@*[starts-with(name(), 'data-')]"/>
is the important part there. That causes anydata-*
attribute to just be dropped on the floor. The rest of that XSL stylesheet is just a basic “identify transform” that passes on everything else from the source XML as-is.The W3C Nu Html Checker (HTML5 validator) backend does something for
data-*
attributes that’s functionally the same as that XSLT transformation, but written in Java. If you’re curious, the code for it is within the GitHub repo for the W3C Nu Html Checker sources, here:https://github.com/validator/validator/tree/master/src/nu/validator/xml/dataattributes
See the
filterAttributes
code inDataAttributeDroppingContentHandlerWrapper.java
It’s essentially a SAX filter that works at parse time off parse events prior to the validation function.
And if you’re even more curious, there is code for other preprocessing filters doing similar things:
nu.validator.xml.customelements.NamespaceChangingContentHandlerWrapper
—filters out custom elements by putting them in a special namespace that the accompanying RelaxNG grammar allows elements from to occur essentially anywherenu.validator.xml.templateelement.TemplateElementDroppingContentHandlerWrapper
—filters outtemplate
element subtrees—essentially just dropping them on the floor, because the HTML spec allowstemplate
subtrees to contain basically anything; so there’s no need to have validation function do any checking on thosetemplate
subtrees at allAnyway, you get the general idea: If there are any cases of markup constructs in your source that you can’t express validation logic for in RelaxNG or XSD, then you essentially filter (preprocess) the source to hide that markup from the validation function.