com.fasterxml.jackson.databind.ObjectMapper encoding for localised characters

3.4k views Asked by At

A little background before I mention my main issue

We have a module that is converting POJO to JSON via FasterXML. The logic is there are multiple XMLs that are first converted into POJOS and then into JSON.

Each of these multiple JSONs is then clubbed into a single JSON and processed upon by a third party.

The issue is up until the point the Single JSON is formed, everything looks fine.

Once all the JSONs are merged and written to a file, the localised characters are all encoded whereas we want the same to look like how they look in the individual JSON

eg Single JSON snippet

{"title":"Web サーバに関するお知らせ"}

eg Merged JSON Snippet

{"title":"Web \u30b5\u30fc\u30d0\u306b\u95a2\u3059\u308b\u304a\u77e5\u3089\u305b"}

byte[] jsonBytes = objectMapper.writeValueAsBytes(object);
String jsonString = new String(jsonBytes, "UTF-8");

This JSON string is then written to file

BufferedWriter writer = new BufferedWriter(new FileWriter(finalJsonPath));
writer.write(jsonString);

ALso tried the following as I thought we need UTF-8 encoding here for localised characters

BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(finalJsonPath),"UTF-8"));
writer.write(jsonString);

The same objectmapper code is used to write to a single json as well, the encoding does not appear at that point..

Please can anyone point out what is causing the encoding issue at merged JSON level?

PS: the code is part of a war which is deployed onto tomcat. Initially we could see ??? (question marks in JSON) after which we added the following to catalina.sh

JAVA_OPTS="$JAVA_OPTS -Dfile.encoding=UTF-8"

Later on, I also added servlet request encoding but that did not help

JAVA_OPTS="$JAVA_OPTS -Dfile.encoding=UTF-8 -Djavax.servlet.request.encoding=UTF-8"

Thanks!

1

There are 1 answers

0
HungryForKnowledge On

Just observed the code was processing the merged json. It is running native2ascii command on the merged json due to which the json localised content was getting converted into ASCII characters

i ran native2ascii on the json with the -reverse option and my finding was confirmed. -reverse reverted the ascii encoding