Why does 'sort' seem to sort a field incorrectly based on the presence or absence of a different field?

Question

Why does 'sort' seem to sort a field incorrectly based on the presence or absence of a different field?

50 views Asked by Sitwon At 11 November 2014 at 16:31

I found some data which seems to behave strangely in 'sort'. When doing a numerical sort on the first field of a csv file, the presence or absence of the 4th column causes the 7th line to be sorted incorrectly.

I'm using GNU sort 8.21 on Slackware64-current.

Data: https://gist.github.com/anonymous/2a7beb4871b25ae8f8b3

This works:

cut -d , -f 1-3 < weird.csv | sort -t , -k 1n

This does not work:

cat weird.csv | sort -t , -k 1n

The 7th line seems to be sorted incorrectly.

I can't seem to find any obvious explanation for this behavior. Using 'g' instead of 'n' has the behavior I would expect, but I'm not clear on what the difference is between 'g' and 'n'.

Original Q&A

There are 1 answers

**Sitwon** · Accepted Answer · 2014-11-11T19:59:50+00:00

I found out what I was doing wrong. Detailed explanation provided here: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19021

In short, I should have used '-k 1,1n' to specify that sorting should start and end at field 1. Because I didn't specify an ending field and my locale silently ignores commas in numbers it wasn't comparing the numbers I thought it was comparing.

TechQA.

Why does 'sort' seem to sort a field incorrectly based on the presence or absence of a different field?

There are 1 answers

Related Questions in SORTING

Related Questions in GNU-COREUTILS

Popular Questions

Popular Tags

Trending Questions