Creating a regex with special characters in Web Harvest

Question

Creating a regex with special characters in Web Harvest

711 views Asked by kburns At 10 February 2011 at 20:15

I am using web harvest (http://web-harvest.sourceforge.net/), the open source web scraping tool.

The regex I am trying to use has "<", ">" characters (because I am trying to strip out all HTML tags that come in). This causes a problem because the content of the elements must consist of well-formed character data or markup.

I need to somehow escape the regex, but can't figure out how.

Any ideas?

Original Q&A

There are 1 answers

**Mark Byers** · Answer 1 · 2011-02-10T20:17:37+00:00

To make the regular expression well-formed XML. Try replacing < with < and > with >. Similarly if you have an & in your regular expression you will need to replace that with &.

Also I'd suggest you use an HTML parser instead of a regular expression for this task.

TechQA.

Creating a regex with special characters in Web Harvest

There are 1 answers

Related Questions in XML

Related Questions in REGEX

Related Questions in WEBHARVEST

Popular Questions

Popular Tags

Trending Questions