Emoji are not being encoded correctly for output writer

1.2k views Asked by At

The program takes in a comment and persists it. The database stores the value correctly.( i'm able to copy and paste it into an emoji page and it shows up correctly). The string i see in the debugger on the postComment request and in the getAllComments response is the same, but it is sending {0xED, 0xA0, 0xBD, 0xED, 0xB8, 0x80 } instead of { 0xF0, 0x9F, 0x98, 0x80 } and showing up as several characters instead of 1. If i set the encoding to UnicodeBig the emoji show up in the response, but we need to be using UTF-8

String jsonString = jsonMapper.toJson(jsonResponse);
response.setContentType("application/json");
response.setCharacterEncoding("UTF-8");
response.getWriter().println(jsonString);

Do i need to do something to these strings before having the system encode to utf8? the libs used are

json-simple-1.1(current is 1.1.1) jackson-core-2.2.3(current 2.6)

Thank you.

1

There are 1 answers

0
Sean Carlisle On

This ended up being an issue with Jetty 7. to get around this you can just .getBytes(String) and then write the bytes out and flush the buffer. Java will get the correct bytes and not encode the surrogate pair which is what Jetty was doing by default.