Find duplicates in array, print count with pair

Question

Find duplicates in array, print count with pair

68 views Asked by martyzee At 15 March 2024 at 14:42

I have an array of value,location pairs

arr=(test,meta my,amazon test,amazon this,meta test,google my,google hello,microsoft)

I want to print the duplicate values, the number/count of them, along with the location.

For example:

3 test: meta, amazon, google
2 my: amazon, google
1 this: meta
1 hello: microsoft

Here test appears 3 times, in meta, amazon, and google

So far, this code will print the item and location

printf '%s\n' "${arr[@]}" | awk -F"," '!_[$1]++'

test,meta
my,amazon
this,meta
hello,microsoft

This will print the count, but it's taking in the value,location as one value

printf '%s\n' "${arr[@]}" | sort | uniq -c | sort -r

   1 my,amazon
   1 my,google
   1 this,meta
   1 test,meta
   1 test,google
   1 test,amazon
   1 hello,microsoft

Original Q&A

There are 4 answers

Ed Morton On 15 March 2024 at 14:56

Using GNU awk for arrays of arrays and length(array):

$ cat ./tst.sh
#!/usr/bin/env bash

arr=(test,meta my,amazon test,amazon this,meta test,google my,google hello,microsoft)

printf '%s\n' "${arr[@]}" |
awk -F',' '
    { vals_locs[$1][$2] }
    END {
        for ( val in vals_locs ) {
            out = length(vals_locs[val]) " " val ": "
            sep = ""
            for ( loc in vals_locs[val] ) {
                out = out sep loc
                sep = ", "
            }
            print out
        }
    }
'

$ ./tst.sh
1 hello: microsoft
1 this: meta
2 my: google, amazon
3 test: google, amazon, meta

oguz ismail On 15 March 2024 at 15:08

With bash only:

arr=(test,meta my,amazon test,amazon this,meta test,google my,google hello,microsoft)

declare -Ai count
declare -A locations

for pair in "${arr[@]}"; do
  value=${pair%%,*}
  location=${pair#*,}
  count[$value]+=1
  locations[$value]+="$location, "
done

for value in "${!locations[@]}"; do
  printf '%d %s: %s\n' ${count[$value]} "$value" "${locations[$value]%, }"
done

dawg On 16 March 2024 at 14:12

Here is a Ruby to do that:

#!/bin/bash

arr=(test,meta my,amazon test,amazon this,meta test,google my,google hello,microsoft)

ruby -F, -lane 'BEGIN{ grps=Hash.new{|h,k| h[k]=[]} }
grps[$F[0]] << $F[1]
END{ grps.each{|k,v| puts "#{v.length} #{k}: #{v.join(", ")}"} }
' <(printf '%s\n' "${arr[@]}")

Or a two pass awk:

awk -F, 'FNR==NR{cnt[$1]++; line[$1]=$1 in line ? line[$1] ", " $2 : $2; next}
!($1 in p){printf "%s %s: %s\n", cnt[$1],$1,line[$1]; p[$1]}
' <(printf '%s\n' "${arr[@]}") <(printf '%s\n' "${arr[@]}")

Either prints

3 test: meta, amazon, google
2 my: amazon, google
1 this: meta
1 hello: microsoft

**anubhava** · Accepted Answer · 2024-03-15T15:03:58+00:00

You may consider this solution that would with any version of awk:

printf '%s\n' "${arr[@]}" |
awk -F, '
{
   row[$1] = (fq[$1]++ ? row[$1] ", " : "") $2
}
END {
   for (k in fq)
      print fq[k], k ":", row[k]
}' | sort -rn -k1

3 test: meta, amazon, google
2 my: amazon, google
1 this: meta
1 hello: microsoft

Note that, I have used sort to get output as per your shown expected output. If you don't care about ordering that you can remove sort command.

TechQA.

Find duplicates in array, print count with pair

There are 4 answers

Related Questions in ARRAYS

Related Questions in BASH

Related Questions in AWK

Related Questions in UNIQ

Popular Questions

Trending Questions