I found some data which seems to behave strangely in 'sort'. When doing a numerical sort on the first field of a csv file, the presence or absence of the 4th column causes the 7th line to be sorted incorrectly.
I'm using GNU sort 8.21 on Slackware64-current.
Data: https://gist.github.com/anonymous/2a7beb4871b25ae8f8b3
This works:
cut -d , -f 1-3 < weird.csv | sort -t , -k 1n
This does not work:
cat weird.csv | sort -t , -k 1n
The 7th line seems to be sorted incorrectly.
I can't seem to find any obvious explanation for this behavior. Using 'g' instead of 'n' has the behavior I would expect, but I'm not clear on what the difference is between 'g' and 'n'.
I found out what I was doing wrong. Detailed explanation provided here: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19021
In short, I should have used '-k 1,1n' to specify that sorting should start and end at field 1. Because I didn't specify an ending field and my locale silently ignores commas in numbers it wasn't comparing the numbers I thought it was comparing.