Comparing German word with umlauts with same word with unicode

1.1k views Asked by At

I'm working on translating Strings from English to German, but German words that are already translated are being translated again.

Say I have this word "Beim Hinzuf\u00E4gen" which has already been translated. I want to compare this to the same word but with umlauts, "Beim Hinzufügen". Both files are read as ISO-8859-1, but when I compare the words they are seen as being different and the word is translated again which I don't want. Even when I replace the Umlaut with the unicode and compare the two, they are still seen as different. I'm sure this is because when I replay the umlaut by "\u00E4", there's an extra backslash being added in.

Anyone have an idea of the preferred method for what I'm trying to do.

2

There are 2 answers

1
IQV On BEST ANSWER

As @Eugene points out, your result ist correct. You compare "Hinzufügen" with "Hinzufägen", which are different.

Unicode 00E4 is "ä",
Unicode 00FC is "ü".

1
Eugene On

It seems that you need to compare these with a Collator:

String left = "Beim Hinzuf\u00E4gen";
String right = "Beim Hinzufägen";
Collator c = Collator.getInstance();
c.setStrength(Collator.PRIMARY);

int result = c.compare(left, right); // 0