Print lines that have no duplicates in a file and preserve sort order linux

Question

Print lines that have no duplicates in a file and preserve sort order linux

318 views Asked by Syed Ahmed Jamil At 08 February 2024 at 21:38

I have the following file:

I want the output like this (unique lines that don't have any duplicates and preserve order):

4
3

I tried sort file.txt | uniq -u it works, but output is sorted:

3
4

I tried awk '!x[$0]++' file.txt it keeps order, but it prints all values once:

Original Q&A

There are 5 answers

pmf On 08 February 2024 at 21:42

Here's an approach using only awk, which reads the input only once, and yet doesn't store the entire file in memory:

fo stores a line's first occurrence into an array. If the line isn't registered yet (!fo[$0]), save the current line number (fo[$0]=NR).
fq counts the frequency of a line, which is incremented for every line read (fq[$0]++).
Also, the yet unincremented value of fq[$0] is used as condition (which is not met on 0, i.e. the only value that would be incremented to the desired frequency of 1, and met otherwise due to an exceeding frequency) to abandon the corresponding register of first occurrence (delete fo[$0]).
Eventually, fo contains only items of relevance (lines occurring not more than once), with the lines' contents as indices, and the line numbers of their first occurrences as values. So, to finish, only the array's indices need to be printed in ascending order of their numeric values. One way to achieve this would be by using asorti (available in GNU Awk 4+) with the proc instruction @val_num_asc to numerically sort by values in ascending order.

awk '
  !fo[$0]  {fo[$0]=NR}
  fq[$0]++ {delete fo[$0]}
  END      {asorti(fo,fo,"@val_num_asc"); for (i in fo) print fo[i]}
'

4
3

amphetamachine On 08 February 2024 at 21:52

Using entirely Bash built-ins, you can do this in just a few lines:

declare -A SEEN=()
while IFS= read -r LINE; do
    (( ++SEEN[_$LINE] ))
done < file.txt
while IFS= read -r LINE; do
    if [[ ${SEEN[_$LINE]} -eq 1 ]]; then
        printf -- '%s\n' "$LINE"
    fi
done < file.txt

Note: The _$LINE as the subscript is to handle empty lines correctly.

dawg On 08 February 2024 at 22:05

Here is a Ruby to do that:

ruby -lne 'BEGIN{cnt=Hash.new {|h,k| h[k] = 0} } 
cnt[$_]+=1
END{puts cnt.select{|k,v| v==1}.keys.join("\n") }
' file

Prints:

4
3

Or, in one read of the file:

ruby -e 'puts $<.read.split(/\R+/).
            group_by{|x| x}.select{|k,v| v.length==1}.keys.join("\n")
' file 
# same output

Unlike awk, Ruby associative arrays maintain insertion order.

If you want a one pass awk you could do:

awk 'BEGIN{OFS="\t"}
{ if (seen[$0]++) delete order[$0]; else order[$0]=FNR } 
END { for ( e in order ) print order[e], e } ' file | sort -nk 1,1 | cut -f2-
# same output

(Thanks Ed Morton for a better awk!)

pmf On 09 February 2024 at 01:57

I tried sort file.txt | uniq -u it works, but output is sorted

You could take that output, and use it as a list of newline-delimited patterns with grep -f on the original file. Use -Fx to treat the patterns as whole line fixed strings (not regular expressions).

sort file.txt | uniq -u | grep -Fxf- file.txt

4
3

**markp-fuso** · Accepted Answer · 2024-02-08T21:46:08+00:00

A couple ideas to choose from:

a) read the input file twice:

awk '
FNR==NR         { counts[$0]++; next }  # 1st pass: keep count
counts[$0] == 1                         # 2nd pass: print rows with count == 1
' file.txt file.txt

b) read the input file once:

awk '
    { lines[NR] = $0                    # maintain ordering of rows
      counts[$0]++
    }
END { for ( i=1;i<=NR;i++ )             # run thru the indices of the lines[] array and ...
          if ( counts[lines[i]] == 1 )  # if the associated count == 1 then ...
             print lines[i]             # print the array entry to stdout
    }
' file.txt

Both of these generate:

4
3

TechQA.

Print lines that have no duplicates in a file and preserve sort order linux

There are 5 answers

Related Questions in AWK

Related Questions in UNIQ

Popular Questions

Trending Questions