" charact..." /> " charact..." /> " charact..."/>

Creating a regex with special characters in Web Harvest

714 views Asked by At

I am using web harvest (http://web-harvest.sourceforge.net/), the open source web scraping tool.

The regex I am trying to use has "<", ">" characters (because I am trying to strip out all HTML tags that come in). This causes a problem because the content of the elements must consist of well-formed character data or markup.

I need to somehow escape the regex, but can't figure out how.

Any ideas?

1

There are 1 answers

0
Mark Byers On

To make the regular expression well-formed XML. Try replacing < with &lt; and > with &gt;. Similarly if you have an & in your regular expression you will need to replace that with &amp;.

Also I'd suggest you use an HTML parser instead of a regular expression for this task.