I have a $text = "Hello üäö$"
I wanted to remove just emoji's from the text using xquery. How can i do that?
expected result : "Hello üäö$"
i tried to use:
replace($text, '[^\x00-\xFFFF]', '')
but didn't work.
Thanks in advance :)
I have a $text = "Hello üäö$"
I wanted to remove just emoji's from the text using xquery. How can i do that?
expected result : "Hello üäö$"
i tried to use:
replace($text, '[^\x00-\xFFFF]', '')
but didn't work.
Thanks in advance :)
To replace emoji, you can make use of XPath's support for Character Class Escapes, specifically Category and Block Escapes, to match named Unicode blocks:
This returns the expected result:
The "Emoticons" block doesn't contain all characters commonly associated with "emoji." For example, (Purple Heart, U+1F49C), according to a site like https://www.compart.com/en/unicode/U+1F49C that lets you look up Unicode character information, is from:
This block is not available in XPath or XQuery processors, since it is neither listed in the XML Schema 1.0 spec linked above, nor is it in Unicode block names for use in XSD regular expressions—a list of blocks that XPath and XQuery processors conforming to XML Schema 1.1 are required to support.
For characters from blocks not available in XPath or XQuery, you can manually construct character classes. For example, given the purple heart character above, we can match it as follows:
This returns the expected result:
If you're wondering why we use
🌀and notU+1F300or\x1F300, it is because, as Michael Kay noted above, "XQuery uses the XML escape convention, not the C/Java escape convention\xFFFF."(I've updated the answer in response to the other very helpful comments.)