Consider the two fragments of code that simply order strings in C#
and F#
respectively:
C#:
var strings = new[] { "Tea and Coffee", "Telephone", "TV" };
var orderedStrings = strings.OrderBy(s => s).ToArray();
F#:
let strings = [| "Tea and Coffee"; "Telephone"; "TV" |]
let orderedStrings =
strings
|> Seq.sortBy (fun s -> s)
|> Seq.toArray
These two fragments of code return different results:
- C#: Tea and Coffee, Telephone, TV
- F#: TV, Tea and Coffee, Telephone
In my specific case I need to correlate the ordering logic between these two languages (one is production code, and one is part of a test assertion). This poses a few questions:
- Is there an underlying reason for the differences in ordering logic?
- What is the recommended way to overcome this "problem" in my situation?
- Is this phenomenon specific to strings, or does it apply to other .NET types too?
EDIT
In response to several probing comments, running the fragments below reveals more about the exact nature of the differences of this ordering:
F#:
let strings = [| "UV"; "Uv"; "uV"; "uv"; "Tv"; "TV"; "tv"; "tV" |]
let orderedStrings =
strings
|> Seq.sortBy (fun s -> s)
|> Seq.toArray
C#:
var strings = new[] { "UV", "Uv", "uv", "uV", "TV", "tV", "Tv", "tv" };
var orderedStrings = strings.OrderBy(s => s).ToArray();
Gives:
- C#: tv, tV, Tv, TV, uv, uV, Uv, UV
- F#: TV, Tv, UV, Uv, tV, tv, uV, uv
The lexicographic ordering of strings differs because of a difference in the underlying order of characters:
- C#: "aAbBcCdD...tTuUvV..."
- F#: "ABC..TUV..Zabc..tuv.."
See section 8.15.6 of the language spec.
Strings, arrays, and native integers have special comparison semantics, everything else just goes to
IComparable
if that's implemented (modulo various optimizations that yield the same result).In particular, F# strings use ordinal comparison by default, in contrast to most of .NET which uses culture-aware comparison by default.
This is obviously a confusing incompatibility between F# and other .NET languages, however it does have some benefits:
Comparer<string>.Default.Compare("a", "A") // -1
Comparer<char>.Default.Compare('a', 'A') // 32
compare "a" "A" // 1
compare 'a' 'A' // 32
Edit:
Note that it's misleading (though not incorrect) to state that "F# uses case-sensitive string comparison". F# uses ordinal comparison, which is stricter than just case-sensitive.