I have my reasons to replace the punctuation character with a underscore in all tag names (please don't ask me why it is not relevant for the question).
What is relevant to the question is that I would like to:
<data:data>
<another:data>Content</another:data>
<another:data>Content</another:data>
<another:data>Content</another:data>
<another:data attribute="attr : content">This content should : not be affected</another:data>
<another:data><![CDATA[This content should : not be affected]]></another:data>
</data:data>
Replace with:
<data_data>
<another_data>Content</another_data>
<another_data>Content</another_data>
<another_data attribute="attr : content">This content should : not be affected</another_data>
<another_data><![CDATA[This content should : not be affected]]></another_data>
</data_data>
But what is the best way to perform this with php
?
I know that regex
is not a proper way to parse html
or xml
but I'm afraid that I'm attached to use preg_replace()
in my situation because DOMDocument()
can't read my ~250K rows of bad structured namespaced provided xml- content. The provided xsd files (~25 schemes) are outdated (for 6 years now), the content- provider is unwilling to fix this.
I found out that SimpleXMLElement()
works after replacing the :
with _
.
You can capture what is between
<
and>
then replace:
with_
, like this:Output: