I have a $text = "Hello üäö$"
I wanted to remove just emoji's from the text using xquery. How can i do that?
Expected result : "Hello üäö$"
i tried to use:
replace($text, '\p{IsEmoticons}+', '')
but didn't work.
it just removed smiley's
Result now: "Hello üäö$" Expected result : "Hello üäö$"
Thanks in advance :)
I outlined the approach in my answer to the original question, which I updated based on your comment asking about how to strip out .
Quoting from that expanded answer:
This approach can be applied to , , or any other character:
Alternatively, rather than locating the blocks of characters you want to strip out, you could identify the blocks of characters you want to preserve. For example, given the example string in the original post, perhaps the goal is to preserve only those characters in the "Basic Latin" block. To do so, we can match characters NOT in this block via the
\PCategory Escape:This query returns:
Notice that this has stripped out the characters with diacritics, which perhaps isn't desired. These characters with diacritics belong to the Latin-1 Supplement block. To preserve characters from both the Latin and Latin-1 Supplement blocks, we'd need to adjust the query as follows:
... which returns:
This now preserves the characters with diacritics.
To be precise about the characters you preserve or remove, you need to consult the Unicode blocks and charts.