My question maybe stupid, or lacking information, and I am sorry for that. I will try to be as descriptive as I believe required.
We have lots of xmls, and there maybe hundreds of schemas (different root/element name, different attributes). While writing the code, many developers have incorrectly added mixed contents in those xmls (only schema for us is ---> "No mixed contents").
We want to re-indent the xml files, but the above mentioned mixed contents are giving us issues. The only xml parser/utility we have is xmllint (we can't get other utilities due to some constraints).
For example:
<A>
mixed data<B>
<C>text data</C>
</B>
<D>new data</D>
</A>
After running (the options I had added to xmllint are just random tries)
xmllint --recover --encode "ISO-8859-1" --format data.xml
I get the following:
<?xml version="1.0" encoding="ISO-8859-1"?>
<A>
mixed data<B><C>text data</C></B>
<D>new data</D>
</A>
I have mentioned tags as A-D as an example, as the elements in our xml aer having hundreds of possible values.. I want some help for the following two:
1) Find out which files have mixed contents. Since we have lots of files, I would prefer to use a bash script(or any such script) 2) Way to properly format the xml files.
Any help would be deeply appreciated. I have been banging my head over this for some time, and something like this, that looks quite simple has been proving to be quite tough for me. Other information for our system is that we are using Unix and we can take the help of perl if required. (But we don't have Twig or LibXML::PrettyPrint)