Replace Html inside Pre tag using Regex

1.9k views Asked by At

How can I replace Html inside pre tag? I would prefer to do that with Regex

<html>
<head></head>
<body>
<div>
<pre>

    <html>
    <body>
    -----> hello! ----< 
    </body>
    </html

</pre>
</div>
</body>
2

There are 2 answers

12
Samuel Lampa On

EDIT: As indicated by another answer, regex does not support HTML or XHTML completely, and so you will be better off using an HTML parser instead. I'm leaving my answer here for reference though.

What do you want to replace the content inside the pre-tags with?

I'm not familiar with the specific C# syntax, but provided C# uses Perl-style regexes, the following PHP-snippet might be helpful. The code below will replace the content inside the pre-tags with the string "(pre tag content was here)" (just tested with the command line PHP client):

<?php
$html = "<html><head></head><body><div><pre class=\"some-css-class\">
         <html><body>
         -----> hello! ----< 
         </body></html
         </pre></div></body>"; // Compacting things here, for brevity

$newHTML = preg_replace("/(.*?)<pre[^<>]*>(.*?)<\/pre>(.*)/Us", "$1(pre tag content was here)$3", $html);
echo $newHTML;
?>

The ? mark is to make the matching non-greedy (stop at first occurence of what comes after), and the mU modifiers specifies "Unicode-character-support" and "single-line support". The latter is important to make . match newlines also. The [^<>]* part is for supporting attributes in the pre tag, such as <pre class="some-css-class"> (it will match any number of characters except for < or >.

UPDATE: As indicated by Martinho Fernandes in the comments below, the C# syntax for the above regex should be something like:

new Regex(@"(.*?)<pre[^<>]*>(.*?)<\/pre>(.*)", RegexOptions.SingleLine)
0
Theun Arbeider On