How to select only rows containing emojis and emoticons in Python?

746 views Asked by At

I have DataFrame in Python Pandas like below:

sentence
------------

I like it
+1
One :-) :)
hah

I need to select only rows containing emoticons or emojis, so as a result I need something like below:

sentence
------------

+1
One :-) :)

How can I do that in Python ?

1

There are 1 answers

8
mozway On

You can select the unicode emojis with a regex range:

df2 = df[df['sentence'].str.contains(r'[\u263a-\U0001f645]')]

output:

  sentence
0      
2     +1

This is however much more ambiguous for the ASCII "emojis" as there is no standard definition and probably endless combinations. If you limit it to the smiley faces that contain eyes ';:' and a mouth ')(' you could use:

df[df['sentence'].str.contains(r'[\u263a-\U0001f645]|(?:[:;]\S?[\)\(])')]

output:

     sentence
0         
2        +1
3  One :-) :)

But you would be missing plenty of potential ASCII possibilities: :O, :P, 8D, etc.