Basically the equivalent of "sort-by" method in CLI.

A sample input file (file1.txt):

one
three
five
eleven
thirteen
sixteen

Another input file (file2.txt, which lists length of corresponding line in file1.txt):

3
5
4
6
8
7

Desired output (sort lines in file1.txt by lines in file2.txt, in this case numerically; or in other words, sort lines in file1.txt by the line's length):

one
five
three
eleven
sixteen
thirteen

I've created a simple Perl script to do this. Sample usage:

% sort-by-lines file1.txt file2.txt
% sort-by-lines /etc/passwd <(perl -nE'say length' /etc/passwd)

But was wondering if a combination of more basic Unix commands (sort, cut, etc) can also do the same in a comparably simple fashion.

1 Answers

4
Ed Morton On Best Solutions

Are you just trying to sort a file by the length of each line? With standard tools in any shell on any UNIX box that'd be:

awk -v OFS='\t' '{print length(), NR, $0}' file | sort -k1,2n | cut -f3-

For example:

$ cat file
other stuff
text
foo
stuff
bar

$ awk -v OFS='\t' '{print length(), NR, $0}' file | sort -k1,2n | cut -f3-
foo
bar
text
stuff
other stuff

If that's not it then please edit your question to clarify what it is you're trying to do and what your actual question is.


Update - given the input you added to your question:

$ paste file2.txt file1.txt | sort -k1,2n | cut -f2-
one
five
three
eleven
sixteen
thirteen

Note that that won't necessarily preserve the order of lines of the same length - you'd need to add the GNU -s ("stable") option to sort to do that:

paste file2.txt file1.txt | sort -s -k1,2n | cut -f2-

or do this which is bash only:

paste file2.txt <(cat -n file1.txt) | sort -k1,2n | cut -f3-

or this which is portable to all shells/Unixes:

awk -v OFS='\t' 'NR==FNR{a[NR]=$0;next} {print a[FNR], FNR, $0}' file2.txt file1.txt | sort -k1,2n | cut -f3-

or do something with an explicit temp file or a here document.