I need to remove HTML tags from a string:
std::String whole_file("<imgxyz width=139\nheight=82 id=\"_x0000_i1034\" \n src=\"cid:[email protected]\" \nalign=baseline border=0> \ndfdsf");
When I use RE2 library for pattern remove
RE2::GlobalReplace(&whole_file,"<.*?>"," ");
The Html Tags are not removed, when i use
RE2::GlobalReplace(&whole_file,"<.*\n.*\n.*?>"," ");
The html tags are removed, why is it so .. can any one suggest a better regular expression to remove HTML tags from a file?
Wild guess:
.
does not match the EOL character.You could use:
"<[.\n]*?>"
to match any number of newline character.