I need to write a java regex substitution (preferably purely declarative) that will do these analogous transformations:
1)
<a nonStandardAttrName="">Some visible text</a>
to
<span class="invalid"><a nonStandardAttrName=""></span>
Some visible text
<span class="invalid"></a></span>
and
2)
<b nonStandardAttrName="">Some visible text</a>
to
<span class="invalid"><b nonStandardAttrName=""></span>
Some visible text
<span class="invalid"></b></span>
3) etc. (i.e. the tag could be a
, b
, foobar
- anything)
(please don't propose a different approach - actually the app uses spans rather than comments but I must use this "suppress by surrounding the tags" approach that is not my design decision)
Is it possible to do this with regular expressions? Matching the opening tag is easy because I can match the substring nonStandardAttrName
, but how would I find the closing tag? Is there some kind of regular expression operation that says "whatever I captured earlier, look for that again later on"? If the tags were some finite set, I could hardcode those tag names in the regex but in my situation the tags could be a lot of non-standard tags. The closest thing I know of is substitution backreferencing - but that is only for output, not input.
What I'm trying to do
It's really not relevant, but my HTML parser will throw away markup that isn't standard HTML. But I need to preserve whatever quirky input the user has given me. So I need to escape it before parsing and unescape after parsing (by using comments or spans).
It's easy actually. You can put backreferences in the input expression. Backreferences are not restricted to output substitution expressions.