Error passing Unicode string through JSONObject

2.4k views Asked by At

I have to pass unicode string to a JSONObject.

JSONObject json = new JSONObject("{\"One\":\"\\ud83c\\udf45\\ud83c\\udf46\"}");
json.put("Two", "\ud83c\udf45\ud83c\udf46");
System.out.println(json.toString());

but I have this:

{"One":"","Two":""}

I want this:

{"One":"\ud83c\udf45\ud83c\udf46","Two":"\ud83c\udf45\ud83c\udf46"}
2

There are 2 answers

0
Remy Lebeau On

The system is working as designed. You are just not taking into account that JSON does not require most Unicode characters to be formatted in \uXXXX format. Certain escape characters must be in \X format, and control characters <= 0x1F must be in \uXXXX format, but any other character may be in \uXXXX format but is not required to be. The characters you have shown do not fall into those ranges, which is why toString() is not encoding them in \uXXXX format.

When you call new JSONObject(String), it decodes the input string into actual Unicode strings, as if you had done this instead:

JSONObject json = new JSONObject();
json.put("One", "\ud83c\udf45\ud83c\udf46");

Which is perfectly fine. You want the JSONObject to hold un-escaped Unicode data internally.

Where you are getting tripped up is the fact that JSONObject.toString() is not formatting your particular Unicode characters in \uXXXX format. That is perfectly valid JSON, but is not how you are wanting them to be formatted (why do you want them formatted this way?).

A look at the source for Java's JSONStringer class (which implements JSONObject.toString()) reveals that it only formats non-reserved control characters <= 0x1F in \uXXXX format, other non-reserved characters are formatted as-is. This conforms to the JSON specification.

To do what you are asking for, you will have to manually format Unicode characters as needed after calling JSONObject.toString() to format reserved and ASCII characters normally, eg:

JSONObject json = new JSONObject("{\"One\":\"\\ud83c\\udf45\\ud83c\\udf46\"}");
// decodes as if json.put("One", "\ud83c\udf45\ud83c\udf46")
// or json.put("One", "") were called directly ...

json.put("Two", "\ud83c\udf45\ud83c\udf46");
// same as calling json.put("Two", "") ...

String s = json.toString();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.length(); ++i)
{
    char ch = s.charAt(i);
    if (ch >= 0x7F)
        sb.append(String.format("\\u%04x", (int) ch));
    else
        sb.append(ch);
}

System.out.println(sb.toString());
// outputs '{"One":"\ud83c\udf45\ud83c\udf46","Two":"\ud83c\udf45\ud83c\udf46"}' as expected ...
2
Maroun On

One way of doing this is:

json.put("Two", "\\u" + "d83c" + "\\u" + "df45" + ...);

This will print the string literal \ud83c\udf45 when you try to print the JSON.