I have legacy XML documents that contain nested (non-root) elements that I want to validate against an XML Schema. The schema itself does not describe the XML document as a whole, but only a particular nested element.
The XML document resembles a message received from a 3rd party system, has no xmlns attributes, and even no XML processing instruction. It's a legacy thing that I cannot influence. Example:
<XM>
<MH> … nested header elements … </MH>
<MD>
<RECSET>
… payload elements go here …
</RECSET>
</MD>
</XM>
My aim is to validate /XM/MD/RECSET against an XSD which defines the RECSET element and any payload elements nested within. I do not have schemas that would describe the outer elements, i.e. XM, MH, MD. I could modify all existing schemas and add dummy definitions, e.g. allowing for xs:all, but that is not preferred.
The validation is an optional step in a processing pipeline, and I want to avoid unnecessarily repeated XML parsing and other processing which adds execution time (throughput is important).
Another constraint is that I want to use XmlDocument, because down the processing pipeline I need an XmlDocument instance to perform deserialization into an object model using XmlSerializer. Again, this is an existing solution that I want to preserve.
My attempt is as follows:
// build an XmlDocument instance as the intermediate format of the message
var xml = new XmlDocument();
xml.LoadXml(msg.TransportMessage);
// obtain a pre-cached XmlSchemaSet instance matching the message represented by XmlDocument
XmlSchemaSet schemaSet = … ;
// find the whole payload represented by the RECSET element
var nodeToValidate = xml.SelectSingleNode("/XM/MD/RECSET");
// attach schemas to the document and validate the payload node
xml.Schemas = xsd;
xml.Validate(ValidationCallback, nodeToValidate);
This results in an error:
Schema information could not be found for the node passed into Validate. The node may be invalid in its current position. Navigate to the ancestor that has schema information, then call Validate again.
I've looked into the implementation of XmlDocument and the DocumentSchemaValidator class, which, in case of specific node validation, searches the DOM for schema information. Hence I tried attaching a reference to the correct schema to the node ad hoc:
XmlAttribute noNamespaceAttribute = xml.CreateAttribute("xsi:noNamespaceSchemaLocation", "http://www.w3.org/XMLSchema-instance");
foreach (XmlSchemaElement x in schemaSet.GlobalElements.Values)
{
if (x.Name == "RECSET")
{
noNamespaceAttribute.InnerText = x.SourceUri!;
break;
}
}
nodeToValidate.Attributes!.Append(noNamespaceAttribute);
However, that results in the very same error message.
A working way to achieve such validation is to take the nodeToValidate.OuterXml and parse it either using a validating XmlReader or a new XmlDocument instance. However, that leads to another overhead in terms of memory and CPU. I'd rather avoid this route.
Is there a way to tell the validation engine to validate a particular node against an explicitly specified schema?
Your problem is that
XmlDocument.Schemasis intended to represent the schema for the entire document:In your case you have no schema for the entire document, so when you attempt to validate a particular node of the document by setting
XmlDocument.Schemasto be the schema for that child node, validation fails, perhaps because the validation code is unable to navigate through the root document's schema (which doesn't exist) to find the specific child schema for the child element to be checked.Options for a workaround depend on what you are trying to accomplish when you call
XmlDocument.Validate(ValidationEventHandler, XmlNode). As explained in the docs, this method actually performs two distinct but related actions:As expected, it validates the XML data in the XmlNode object against the schemas contained in the Schemas property.
It also performs infoset augmentation:
Action #1 seems clear, but what exactly is infoset augmentation? This isn't clearly documented, but one effect is to populate the contents of
XmlNode.SchemaInfo. For instance, using the XML and XSD from https://www.w3schools.com/xml/schema_example.asp as an example, if I validate the root element against the XSD and check the contents ofDocumentElement.SchemaInfobefore and after as follows:The result clearly shows that
DocumentElement.SchemaInfohas been populated.Demo fiddle #1 here.
Further, it seems that
XmlDocument.Validate(ValidationEventHandler, XmlNode)may actually insert additional nodes into the document, see XmlDocument.NodeInserted triggered on XmlDocument.Validate() for one such example.But do you really need to modify your
XmlDocumentvia infoset augmentation, or do you just need to perform a read-only validation?If you don't need infoset augmentation, you may validate an
XmlNodeby constructing anXmlNodeReaderfrom it and then using the reader for read-only validation. First introduce the following extension methods:And now you will be able to do:
Note that you never set
XmlDocument.Schemaswith this approach. Demo fiddle #2 here.If you do need infoset augmentation you will need to rethink your approach, possibly by programmatically generating a plausible
XmlSchemafor the<XM><MD>...</MD></XM>wrapper elements in runtime and adding it toXmlDocument.Schemasbefore validation.