bash sort ignoring non-alpha characters

2.7k views Asked by At

I'm trying to extract a list of unique tags from a tagged-text file. Tags are delimited by angle brackets, and each tag name starts with a colon: <:ttx>, <ol_2> and so on.

I started by adding a line-break after each >, then tried sort. The results baffled me, until I realized that sort was ignoring the first two characters.

Is there a switch I need to add, or is my Bbuntu-flavoured bash going for sort -d without the option?

1

There are 1 answers

0
oHo On

use LANG=C to disable your locale => sort usually works better:

grep -o '<:[A-Za-z0-9]>' your-tagged-text-file | LANG=C sort