Get emoticon unicode from char UTF-16

3.8k views Asked by At

I need to intercept an emoticon entry and change for my own emoticon. When I intercept an emoticon, for example, the FACE WITH MEDICAL MASK (\U+1F604), I get an UTF-16 char (0xD83D 0xDE04), Is it possible to convert this char value to the unicode value?

I need to convert 0xD83D 0xDE04 to \u1f604.

Thanks,

2

There are 2 answers

0
bobince On BEST ANSWER

I get an UTF-16 char (0xD83D 0xDE04), Is it possible to convert this char value to the unicode value?

For just a single code point in a string, you can convert it to an integer with:

int codepoint = "\uD83D\uDE04".codePointAt(0);  // 0x1F604

It is, however quite tedious to go over a whole string with codePointCount/codePointAt. Java/Dalvik's String type is strongly tied to UTF-16 code units and the codePoint methods are a poorly-integrated afterthought. If you are simply hoping to replace an emoji with some other string of characters, you are probably best off doing a plain string replace or regex with the two code units as they appear in the String type, eg text.replace("\uD83D\uDE04", ":-D").

(BTW Face with medical mask is U+1F637.)

0
chiuki On

\u1f604 is the UTF-32 encoding of that emoticon. You can convert this way:

byte[] bytes = "\uD83D\uDE37".getBytes("UTF-32BE");