Why is "1,2,3" parsed as 123 by Java's en-US DecimalFormat?

161 views Asked by At

Parsing "1,2,3" using the en-US's decimal format results in the number 123.

import java.text.*;
import java.util.Locale;

public class Main{

    public static void main(String[] args) throws ParseException {

        final String text = "1,2,3";
        final int weirdResult = 123;

        final NumberFormat usDecimalFormat = DecimalFormat.getNumberInstance(Locale.US);
        final Number parsedNumber = usDecimalFormat.parse(text);

        if(parsedNumber.doubleValue() == weirdResult){
            System.out.println(text + " is bizarrely parsed as " + weirdResult);
        } else {
            System.out.println(text + " is parsed as " + parsedNumber);
        }
    }
}

Run on Ideone

The above code actually prints 1,2,3 is bizarrely parsed as 123.

How is "1,2,3" a valid number and why its value is 123?

2

There are 2 answers

4
Rogue On

DecimalFormat has an internal method, #subparse, which takes care of interpreting separators and other special characters (dependent on the Locale that you have passed in). It will track the characters of interest (like digits, the decimal separator, or currency symbols), but of particular interest is how "grouping separators" (e.g. , in UK, . in France) are handled when doing this parse:

char grouping = symbols.getGroupingSeparator();
//...
for (; position < text.length(); ++position) {
    char ch = text.charAt(position);
    //...
    } else if (!isExponent && ch == grouping && isGroupingUsed()) {
        if (sawDecimal) {
            break;
        }
        // Ignore grouping characters, if we are using them, but
        // require that they be followed by a digit.  Otherwise
        // we backup and reprocess them.
        backup = position;
    } //...
}

Source: DecimalFormat.java:2290 (Note: The above was refactored into #subparseNumber as of JDK 12)

Note the break means to finish parsing the number, and the lack of appending to digitList means they effectively ignore these grouping separators. Further, the ignoring of these grouping separators is not dependent on how many digits have been parsed since the last separator. The only conditional based on previous separators is if the grouping separator appears after the decimal separator, hence the break.

2
Stephen C On

This explains "how" the parsing code does that. But not "why". Is it a bug? Is "1,2,3" a valid number?

The why is that they decided to treat the "grouping" character (which in EN locales is ',') leniently when parsing. After all, the use of grouping characters ... and the number of digits in the groups ... are only conventions. Lenient parsing is generally a good idea when dealing with inputs whose meaning is not ambiguous. (Though that is debatable here.)

Basically, in most contexts the grouping character is quietly ignored when parsing. The javadoc for the DecimalFormat.parse method doesn't mention this. However, the javadoc is largely silent on the format that the method accepts, so the behavior is not inconsistent with the documentation.

This was reported as bug JDK-7049000 against Java 6 back in 2011, but you can see the bug report has received no attention.

And here's the problem. If they did try to do something to fix this bug / feature, the fix would be liable to cause problems for existing applications. These apps could reject inputs that they previously accepted ... for no particularly good reason from the users' point of view.


So, my advice would be to accept DecimalFormat for what it is. If you particularly don't want to accept numbers with grouping characters:

  • use DecimalFormal.setDecimalFormatSymbols to set the grouping separator to some character that can never appear, or

  • validate or split your input numbers with (say) a regex prior to parsing them.