What is the theory behind unicode sorting? I understand how it works, but I don't understand why they decided on this standard for collation sorting.
It seems that when you have two strings to compare, using ucol_strcolliter() for example:
ucol_strcollIter(collator, &stringIter1, &stringIter2, &Status)
Then, say I the two strings are:
string string1 = "hello"
string string2 = "héllo"
Under the "Secondary" collation strength, string1 should be ordered before string2. Where string1 and string2 are compared on their secondary strength.
<1 hello
<2 héllo
BUT
If you have trailing spaces, like:
string string1 = "hello "
string string2 = "héllo "
then the accented hello (string2) will be placed before string1. And, both are compared on their primary weight.
<1 héllo
<1 hello
Why does the unicode collation algorithm take into account the trailing spaces?
Is there some reason behind this?
Probably the best TP would be this.
You can try various option combinations with the ICU Collation Demo. (give "alternate=shifted" a try)