Default ordering in C# vs. F#

890 views Asked by At

Consider the two fragments of code that simply order strings in C# and F# respectively:

C#:

var strings = new[] { "Tea and Coffee", "Telephone", "TV" };
var orderedStrings = strings.OrderBy(s => s).ToArray();

F#:

let strings = [| "Tea and Coffee"; "Telephone"; "TV" |]
let orderedStrings =
    strings
    |> Seq.sortBy (fun s -> s)
    |> Seq.toArray

These two fragments of code return different results:

  • C#: Tea and Coffee, Telephone, TV
  • F#: TV, Tea and Coffee, Telephone

In my specific case I need to correlate the ordering logic between these two languages (one is production code, and one is part of a test assertion). This poses a few questions:

  • Is there an underlying reason for the differences in ordering logic?
  • What is the recommended way to overcome this "problem" in my situation?
  • Is this phenomenon specific to strings, or does it apply to other .NET types too?

EDIT

In response to several probing comments, running the fragments below reveals more about the exact nature of the differences of this ordering:

F#:

let strings = [| "UV"; "Uv"; "uV"; "uv"; "Tv"; "TV"; "tv"; "tV" |]
let orderedStrings =
    strings
    |> Seq.sortBy (fun s -> s)
    |> Seq.toArray

C#:

var strings = new[] { "UV", "Uv", "uv", "uV", "TV", "tV", "Tv", "tv" };
var orderedStrings = strings.OrderBy(s => s).ToArray();

Gives:

  • C#: tv, tV, Tv, TV, uv, uV, Uv, UV
  • F#: TV, Tv, UV, Uv, tV, tv, uV, uv

The lexicographic ordering of strings differs because of a difference in the underlying order of characters:

  • C#: "aAbBcCdD...tTuUvV..."
  • F#: "ABC..TUV..Zabc..tuv.."
4

There are 4 answers

1
latkin On BEST ANSWER

See section 8.15.6 of the language spec.

Strings, arrays, and native integers have special comparison semantics, everything else just goes to IComparable if that's implemented (modulo various optimizations that yield the same result).

In particular, F# strings use ordinal comparison by default, in contrast to most of .NET which uses culture-aware comparison by default.

This is obviously a confusing incompatibility between F# and other .NET languages, however it does have some benefits:

  • OCAML compat
  • String and char comparisons are consistent
    • C# Comparer<string>.Default.Compare("a", "A") // -1
    • C# Comparer<char>.Default.Compare('a', 'A') // 32
    • F# compare "a" "A" // 1
    • F# compare 'a' 'A' // 32

Edit:

Note that it's misleading (though not incorrect) to state that "F# uses case-sensitive string comparison". F# uses ordinal comparison, which is stricter than just case-sensitive.

// case-sensitive comparison
StringComparer.InvariantCulture.Compare("[", "A") // -1
StringComparer.InvariantCulture.Compare("[", "a") // -1

// ordinal comparison
// (recall, '[' lands between upper- and lower-case chars in the ASCII table)
compare "[" "A"  // 26
compare "[" "a"  // -6
1
Lawrence On

Thanks to @Richard and his answers for pointing me in the direction for getting a little further on in understanding this problem

My problems seem to have been rooted in not fully understanding the consequences of the comparison constraint in F#. Here is the signature of Seq.sortBy

Seq.sortBy : ('T -> 'Key) -> seq<'T> -> seq<'T> (requires comparison)

My assumption was that if the type 'T implemented IComparable then this would be used in the sorting. I should have consulted this question first: F# comparison vs C# IComparable, which contains some useful references, but which require some further careful reading to fully appreciate what is going on.

So, to attempt to answer my own questions:

Is there an underlying reason for the differences in ordering logic?

Yes. The C# version seems to use the string's implementation of IComparable, whereas the F# version does not.

What is the recommended way to overcome this "problem" in my situation?

Although I cannot comment on whether this is "recommended", the F# function order below will use an implementation of IComparable if there is one on the relevant type:

let strings = [| "UV"; "Uv"; "uV"; "uv"; "Tv"; "TV"; "tv"; "tV" |]
let order<'a when 'a : comparison> (sequence: seq<'a>) = 
    sequence 
    |> Seq.toArray
    |> Array.sortWith (fun t1 t2 ->
        match box t1 with
        | :? System.IComparable as c1 -> c1.CompareTo(t2)
        | _ ->
            match box t2 with
            | :? System.IComparable as c2 -> c2.CompareTo(t1)
            | _ -> compare t1 t2)
let orderedValues = strings |> order

Is this phenomenon specific to strings, or does it apply to other .NET types too?

There are clearly some subtleties involved with the relation between the comparison constraint and the IComparable interface. To be on the safe side I will follow @Richard's advice and always be explicit about the kind of comparison - probably using the function above to "prioritize" using IComparable in the sorting.

5
Richard On

Different libraries make different choices of the default comparison operation on strings. F# is strict defaulting to case sensitivity, while LINQ to Objects is case insensitive.

Both List.sortWith and Array.sortWith allow the comparison to be specified. As does an overload of Enumerable.OrderBy.

However the Seq module does not appear to have an equivalent (and one isn't being added in 4.6).

For the specific questions:

Is there an underlying reason for the differences in ordering logic?

Both orderings are valid. In English cases insensitivity seems more natural, because that's what we're used to. But this does not make it more correct.

What is the recommended way to overcome this "problem" in my situation?

Be explicit about the kind of comparison.

Is this phenomenon specific to strings, or does it apply to other .NET types too?

char will also be affected. And any other type where there is more than one possible ordering (eg. a People type: you could order by name or date of birth depending on the specific requirements).

0
Grundoon On

This has nothing to do with C# vs F#, or even IComparable, but is just due to the different sort implementations in the libraries.

The TL;DR; version is that sorting strings can give different results:

"tv" < "TV"  // false
"tv".CompareTo("TV")  // -1 => implies "tv" *is* smaller than "TV"

Or even clearer:

"a" < "A"  // false
"a".CompareTo("A")  // -1 => implies "a" is smaller than "A"

This is because CompareTo uses the current culture (see MSDN).

We can see how this plays out in practice with some different examples.

If we use the standard F# sort we get the uppercase-first result:

let strings = [ "UV"; "Uv"; "uV"; "uv"; "Tv"; "TV"; "tv"; "tV" ]

strings |> List.sort 
// ["TV"; "Tv"; "UV"; "Uv"; "tV"; "tv"; "uV"; "uv"]

Even if we cast to IComparable we get the same result:

strings |> Seq.cast<IComparable> |> Seq.sort |> Seq.toList
// ["TV"; "Tv"; "UV"; "Uv"; "tV"; "tv"; "uV"; "uv"]

On the other hand if we use Linq from F#, we get the same result as the C# code:

open System.Linq
strings.OrderBy(fun s -> s).ToArray()
// [|"tv"; "tV"; "Tv"; "TV"; "uv"; "uV"; "Uv"; "UV"|]

According to MSDN, the OrderBy method "compares keys by using the default comparer Default."

The F# libraries do not use Comparer by default, but we can use sortWith:

open System.Collections.Generic
let comparer = Comparer<string>.Default

Now when we do this sort, we get the same result as the LINQ OrderBy:

strings |> List.sortWith (fun x y -> comparer.Compare(x,y))
// ["tv"; "tV"; "Tv"; "TV"; "uv"; "uV"; "Uv"; "UV"]

Alternatively, we can use the built-in CompareTo function, which gives the same result:

strings |> List.sortWith (fun x y -> x.CompareTo(y))
// ["tv"; "tV"; "Tv"; "TV"; "uv"; "uV"; "Uv"; "UV"] 

Moral of the tale: If you care about sorting, always specify the specific comparison to use!