Regex pattern to find occurrences of html tags

575 views Asked by At

Say I have a string that looks like this:

iword/i

Here the tag is i. This is similar to an HTML tag except without the <> angle brackets.

Or say I have

emword/em

Here the tag is em.

What I want is a pattern that removes these tags.

I'm testing this pattern:

<([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1> on http://rubular.com/, but it is not working properly.

Specifically, what I want to do is with Objective-C:

NSString *string = @"iword/i";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:NSRegularExpressionCaseInsensitive error:&error];
return [regex stringByReplacingMatchesInString:string options:0 range:NSMakeRange(0, string.length) withTemplate:@""];

which will just remove all but word.

1

There are 1 answers

6
brandonscript On BEST ANSWER

You're going to need a complete list of html tags you want to remove then (i, em, b, what else?) since you're going to have to search specifically for the tags to remove.

One way of doing this is: \b(i|em|b)(\w*)\/(i|em|b)\b (and as you've seen before with Obj-c, likely some double \ escaping)

In action: http://regex101.com/r/qL3cU9

Input:

  • iword/i
  • emword/em
  • bword/b
  • ibword/ib
  • notgoing/tomatch this

Substitution result:

  • word
  • word
  • word
  • ibword/ib
  • notgoing/tomatch this