RegEx to Reject Unescaped HTML Character

1.3k views Asked by At

I want to restrict usage of unescaped ampersands in a particular input field. I'm having trouble getting a RegEx to kill usage of "&" unless followed by "amp;"...or perhaps just restrict usage of "& " (note the space).

I tried to adapt the answer in this thread, but to no avail. Thanks.

(FWIW, here's a RegEx I made to ensure that a filename field didn't contain restrited chars. and ended in .mp3. It works fine, but does it look efficient?)

^[^&,<,>,:,",/,\\,|,?,\*]+(\.mp3|\.MP3|\.Mp3|\.mP3)$
2

There are 2 answers

1
Mark Byers On BEST ANSWER

This regular expression matches any occurrence of & which is not followed by amp;:

/&(?!amp;)/

Rubular

This regular expression accepts strings that contain characters except &, or the string &amp;:

/^([^&]|&amp;)*$/

Rubular

You can use either one or the other, depending on which is most convenient. The difference is that the string should be rejected if the first regular expression matches, whereas the string should be accepted if the second regular expression matches.

0
Twisol On

You can match on /&(?!amp;)/ to locate any &'s not followed by &amp. The (?!) construction is called a negative lookahead.

Assuming you're using a regexp engine that supports them, at any rate. I know Perl/PCRE regexps do.