Regex Expressions For Emoji

765 views Asked by At

http://jsfiddle.net/bxeLyneu/1/

function custom() {
var str = document.getElementById('original').innerHTML;
var replacement = str.replace(/\B:poop:\B/g,'REPLACED');
document.getElementById('replaced').innerHTML = replacement;
}
custom()

Yes = :poop: should be replaced with "REPLACED" No = :poop: should not be replaced. In other words, remain untouched.

Number 4, 5, 6 doesn't seems to follow the rule provided. I do know why, but I don't have much idea how to combine multiple expressions into one. I have tried many others but I just can't get them to work the way I wanted them to be. Odds aren't in my favor.

And yes, this is very similar to how Facebook emoji in chat box works.

New issue:

enter image description here

http://jsfiddle.net/xaekh8op/13/

/(^|\s):bin:(\s|$)/gm

It is unable to scan and replace the one in the middle. How can I fix that?

1

There are 1 answers

7
Jay Bosamiya On BEST ANSWER

\B means "Any location not at a word boundary" whereas \s means "Whitespace". Based upon your given examples, the following code works perfectly.

function custom() {
    var str = document.getElementById('original').innerHTML;
    var replacement = str.replace(/([\s>]|^):poop:(?=[\s<]|$)/gm,'$1REPLACED');
    document.getElementById('replaced').innerHTML = replacement;
}
custom()

http://jsfiddle.net/xaekh8op/15/

Explanation:

The regular expression ([\s>]|^):poop:(?=[\s<]|$) stands for the following:

Regular expression visualization (image created in Debuggex)

By picking one of \s and > at the start (or using ^ meaning start of line), and grouping it as group 1, we can use it later. Similarly for after the :poop: (\s or < or end-of-line $). However, the second time, it is done using a look-ahead ((?= ...) is the syntax), which checks whether the [\s<]|$ portion is there after, but it doesn't consume it in the replacement. The < and > take care of any HTML tags that might be just beside the :poop:. The $1 in the replacement string $1REPLACED places the first group back, thereby rendering only the :poop: being replaced with REPLACED. The second "group" was just a look-ahead, and thus does not need to be replaced back.

For further information on word boundaries, you can refer to http://www.regular-expressions.info/wordboundaries.html which says:

There are three different positions that qualify as word boundaries:

  • Before the first character in the string, if the first character is a word character.
  • After the last character in the string, if the last character is a word character.
  • Between two characters in the string, where one is a word character and the other is not a word character.