How to use unsupported Locale in Java 11 and numbers in String.format()

4.3k views Asked by At

How can I use an unsupported Locale (eg. ar-US) in JAVA 11 when I output a number via String.format()?

In Java 8 this worked just fine (try jdoodle, select JDK 1.8.0_66):

Locale locale = Locale.forLanguageTag("ar-US");
System.out.println(String.format(locale, "Output: %d", 120));
// Output: 120

Since Java 11 the output is in Eastern Arabic numerals (try jdoodle, use default JDK 11.0.4):

Locale locale = Locale.forLanguageTag("ar-US");
System.out.println(String.format(locale, "Output: %d", 120));
// Output: ١٢٠

It seems, this problem comes from the switch in the Locale Data Providers form JRE to CLDR (source: Localization Changes in Java 9 by @mcarth). Here is a list of supported locales: JDK 11 Supported Locales

UPDATE

I updated the questions example to ar-US, as my example before didn't make sense. The idea is to have a format which makes sense in that given country. In the example it would be the United States (US).

2

There are 2 answers

7
Naman On

The behavior conforms to the CLDR being treated as the preferred Locale. To confirm this, the same snippet in Java-8 could be executed with

-Djava.locale.providers=CLDR

If you step back to look at the JEP 252: Use CLDR Locale Data by Default, the details follow :

The default lookup order will be CLDR, COMPAT, SPI, where COMPAT designates the JRE's locale data in JDK 9. If a particular provider cannot offer the requested locale data, the search will proceed to the next provider in order.

So, in short if you really don't want the default behaviour to be that of Java-11, you can change the order of lookup with the VM argument

-Djava.locale.providers=COMPAT,CLDR,SPI

What might help further is understanding more about picking the right language using CLDR!

11
rzwitserloot On

I'm sure I'm missing some nuance, but the problem is with your tag, so fix that. Specifically:

ar-EN makes no sense. That's short for:

language = arabic
country = ?? nobody knows.

EN is not a country. en is certainly a language code (for english), but the second part in a language tag is for country, and EN is not a country. (for context, there is en-GB for british english and en-US for american english).

Thus, this is as good as ar (as in, language = arabic, not tied to any particular country). Even if you did tie it to some country, that is mostly immaterial here; that would affect things like 'what is the first day of the week' ,'which currency symbol is to be presumed' and 'should temperatures be stated in Kelvin or Fahrenheit' perhaps. It has no bearing on how to show digits, because that's all based on language.

And language is arabic, thus, ١٢٠ is what you get when you try ar as a language tag when printing the number 120. The problem is that you expect this to return "120" which is a bizarre wish1, combined with the fact that java, unfortunately, shipped with a bug for a long long time that made it act in this bizarre fashion, thinking that rendering the number 120 in arabic is best done with "120", which is wrong.

So, with that context, in order of preference:

Best solution

Find out why your system ends up with ar-EN and nevertheless expects '120', and fix this. Also fix ar-EN in general; EN is not a country.

More generally, 'unsupported locale' isn't really a thing. the ar part is supported, and it's the only relevant part of the tag for rendering digits.

Alternatives

The most likely best answer if the above is not possible is to explicitly work around it. Detect the tag yourself, and write code that will just respond with the result of formatting this number using Locale.ENGLISH instead, guaranteeing that you get Output: 120. The rest seems considerably worse: You could try to write a localization provider which is a ton of work, or you can try to tell java to use the JRE version of the provider, but that one is obsoleted and will not be updated, so you're kicking the can down the road and setting yourself up for a maintenance burden later.

1.) Given that the JRE variant actually printed 120, and you're also indicating you want this, I get that nagging feeling I'm missing some political or historical info and the expectation that ar-EN results in rendering the number 120 as "120" is not so crazy. I'd love to hear that story if you care to provide it!