Need a RegEx guru

Question

Need a RegEx guru

347 views Asked by Aaron At 08 December 2009 at 19:23

I'm trying to write a script that parses a block of HTML and matches words against a given glossary of terms. If it finds a match, it wraps the term in <a class="tooltip"></a> and provides a definition.

It's working okay -- except for two major shortcomings:

It matches text that is in attributes
It matches text that is already in an <a> tag, created a nested link.

Is there any way to have my regular expression match only words that are not in attributes, and not in <a> tags?

Here's the code I'm using, in case it's relevant:

foreach(Glossary::map() as $term => $def) {
  $search[] = "/\b($term)\b/i";
  self::$lookup[strtoupper($term)] = $def;
}

return preg_replace_callback($search, array(&$this,'replace'),$this->content);

Original Q&A

There are 3 answers

**Tim Sylvester** · Answer 1 · 2009-12-08T19:29:44+00:00

"Don't do that with a regex."

Use an HTML parser, then apply a regex to the contents of HTML elements as it identifies them. That will allow you to easily operate on lots of different variants of HTML structure, valid and otherwise, without a lot of cruft and hard-to-maintain regular expressions.

Robust and Mature HTML Parser for PHP

**Stephan Eggermont** · Answer 2 · 2009-12-08T19:31:05+00:00

Stephan Eggermont On 08 December 2009 at 19:31

HTML parsing is an interesting research topic. What do you mean with HTML? There are standards (quite a few), and there are web pages. Most researchers do not use regular expressions to parse HTML

**Lee** · Answer 3 · 2009-12-08T19:33:01+00:00

Lee On 08 December 2009 at 19:33

Personally, I prefer this answer.

TechQA.

Need a RegEx guru

There are 3 answers

Related Questions in REGEX

Related Questions in GLOSSARY

Popular Questions

Trending Questions