Here is the list of emoticons: http://en.wikipedia.org/wiki/List_of_emoticons I want to form a regex which checks if any of these emoticons exist in the sentence. For example, "hey there I am good :)" or "I am angry and sad :(" but there are a lot of emoticons in the list on wikipedia so wondering how I can achieve this task. I am new to regex. & python.
>>> s = "hey there I am good :)"
>>> import re
>>> q = re.findall(":",s)
>>> q
[':']
I see two approaches to your problem:
Here is some code that should get you started for both approaches:
Both approaches have pros, cons, and some general limitations. You will always have false positives, like in a mathematical term like
18^P
. It might help to put spaces around the expression, but then you can't match smileys followed by punctuation. The first approach is more powerful and catches smileys the second approach won't match, but only as long as they follow a certain schema. You could use the same approach for "eastern" smileys, but it won't work for strictly symmetric ones, like=^_^=
, as this is not a regular language. The second approach, on the other hand, is easier to extend with new smileys, as you just have to add them to the list.