Why does ruby sort Alphanumeric strings with 1's and 0's as binary?

333 views Asked by At

I have an array ["Q10", "Q100", "Q1000", "Q1000a", "Q1001", "Q98"]. After sorting it, I get the following result:

['Q100', 'Q1000', 'Q1000a','Q98', 'Q10', 'Q1001'].sort
["Q10", "Q100", "Q1000", "Q1000a", "Q1001", "Q98"]

Because of this behaviour, I cannot sort my ActiveRecord objects correctly. I have a model of Question which has a field label. I need to sort it based on label. So Question with label Q1 would be first and the question with label Q1a would follow and so on. I get in a similar order with ActiveRecord described to the above example of array. I am using postgresql as my database.

Now I have 3 questions.

  1. Why alphanumeric string sorting behave that way?
  2. How can I achieve my required sorting without using the sort block?
  3. How can I achieve that sorting in ActiveRecord?
2

There are 2 answers

1
Cary Swoveland On BEST ANSWER

If your array were

arr = ["Q10", "Q100", "Q1000", "Q8", "Q1001", "Q98"]

you could write

arr.sort_by { |s| s[/\d+/].to_i }
  #=> ["Q8", "Q10", "Q98", "Q100", "Q1000", "Q1001"]

If

s = "Q1000"

then

s[/\d+/].to_i
  #=> 1000

See Enumerable#sort_by and String#[].

The regular expression /\d+/ matches a substring of s that contains one or more digits.


If the array were

arr = ["Q10b", "Q100", "Q1000", "Q10a", "Q1001", "Q98", "Q10c"]

you could write

arr.sort_by { |s| [s[/\d+/].to_i, s[/\D+\z/]] }
  #=> ["Q10a", "Q10b", "Q10c", "Q98", "Q100", "Q1000", "Q1001"] 

If

s = "Q10b"

then

[s[/\d+/].to_i, s[/\D+\z/]]
  #=> [10, "b"]

The regular expression /\D+\z/ matches a substring of s that contains one or more non-digits at the end (\z) of the string.

See Array#<=>, specifically the third paragraph, for an explanation of how arrays are ordered when sorting.


If the array were

arr = ["Q10b", "P100", "Q1000", "PQ10a", "Q1001", "Q98", "Q10c"]

you could write

arr.sort_by { |s| [s[/\A\D+/], s[/\d+/].to_i, s[/\D+\z/]] }
  #=> ["P100", "PQ10a", "Q10b", "Q10c", "Q98", "Q1000", "Q1001"]

If

s = "PQ10a"

then

[s[/\A\D+/], s[/\d+/].to_i, s[/\D+\z/]]
  #=> ["PQ", 10, "a"]

The regular expression /\A\D+/ matches a substring of s that contains one or more non-digits at the beginning (\A) of the string.

0
Eyeslandic On

This should do the trick for you, casting them to numbers before sorting.

['100', '1000', '98', '10', '1001'].map(&:to_i).sort

This strange map(&:to_i) is shorthand for map { |x| x.to_i }

Edit:

You could do this with AR. This will throw an error if the column doesn't contain a number disguised as a string.

Model.order("some_column::integer")

Edit II:

Try this if it contains strings as well.

Model.order("cast(some_column as integer))"