How to replace Html Comment <!-- comment --> tags with string.Empty

1.4k views Asked by At

I am trying to remove all the Html comment tags from my htmlNode, Following is the HtmlNode selection code from my c#

HtmlNode table = doc5.DocumentNode.SelectSingleNode("//div[@id='div12']");

returned HtmlNode pseudo markup code below

<table>
  <tr>
    <td>test</td>
    <td>
      <!-- <a href='url removed' >Test link Test 2 Comment </a> -->
    </td>
  </tr>
</table>​

I managed to configure Regular expression to resolve my issue, c# code is as follows. But, only in my test run it successfully worked and with input type as string. See the c# code below.

string rkr;
rkr = "<!-- <a href='url removed' >Test link Test 2 Comment </a> -->";
rkr = Regex.Replace(rkr, @"(\<!--\s*.*?((--\>)|$))",String.Empty);

Result = "". which is what I want in live run for all the tags.

I have seen many code examples on forums and stackoverflow forum. but nothing is close to what I want. one post was really useful but it was for php - so again no use.

Now, if in the above Regex.Replace function I enter

rkr = Regex.Replace(table, @"(\<!--\s*.*?((--\>)|$))",String.Empty);

I get following error

The best overloaded method match for 'System.Text.RegularExpressions.Regex.Replace(string, System.Text.RegularExpressions.MatchEvaluator, int)' has some invalid arguments

I also tried to convert

rkr = Regex.Replace(table.ToString(), @"(\<!--\s*.*?((--\>)|$))",String.Empty);

But then I get rkr = "HtmlAgilityPack.HtmlNode" return value.

Any help would be great help.

2

There are 2 answers

0
Jag On BEST ANSWER

Thank you all for your help. I found solution in following function.

Just called the function after populating the doc5 as follows

HtmlNode table = doc5.DocumentNode.SelectSingleNode("//div[@id='div12']");

RemoveComments(table);

public static void RemoveComments(HtmlNode node)
{
    foreach (var n in node.ChildNodes.ToArray())
        RemoveComments(n);
    if (node.NodeType == HtmlNodeType.Comment)
        node.Remove();
}

For reference : I found the answer in the following post. How to select node types which are HtmlNodeType Comment using HTMLAgilityPack

very precise and with many different example types, exactly what I was after.

1
Alexander Polyankin On

Answered here:

doc5.DocumentNode.Descendants()
    .Where(n => n.NodeType == HtmlAgilityPack.HtmlNodeType.Comment)
    .ToList()
    .ForEach(n => n.Remove());

Note: ToList is necessary, because you cannot change the sequence that you are enumerating.