Find duplicates in array, print count with pair

68 views Asked by At

I have an array of value,location pairs

arr=(test,meta my,amazon test,amazon this,meta test,google my,google hello,microsoft)

I want to print the duplicate values, the number/count of them, along with the location.

For example:

3 test: meta, amazon, google
2 my: amazon, google
1 this: meta
1 hello: microsoft

Here test appears 3 times, in meta, amazon, and google

So far, this code will print the item and location

printf '%s\n' "${arr[@]}" | awk -F"," '!_[$1]++'
test,meta
my,amazon
this,meta
hello,microsoft

This will print the count, but it's taking in the value,location as one value

printf '%s\n' "${arr[@]}" | sort | uniq -c | sort -r
   1 my,amazon
   1 my,google
   1 this,meta
   1 test,meta
   1 test,google
   1 test,amazon
   1 hello,microsoft
4

There are 4 answers

3
anubhava On BEST ANSWER

You may consider this solution that would with any version of awk:

printf '%s\n' "${arr[@]}" |
awk -F, '
{
   row[$1] = (fq[$1]++ ? row[$1] ", " : "") $2
}
END {
   for (k in fq)
      print fq[k], k ":", row[k]
}' | sort -rn -k1

3 test: meta, amazon, google
2 my: amazon, google
1 this: meta
1 hello: microsoft

Note that, I have used sort to get output as per your shown expected output. If you don't care about ordering that you can remove sort command.

1
Ed Morton On

Using GNU awk for arrays of arrays and length(array):

$ cat ./tst.sh
#!/usr/bin/env bash

arr=(test,meta my,amazon test,amazon this,meta test,google my,google hello,microsoft)

printf '%s\n' "${arr[@]}" |
awk -F',' '
    { vals_locs[$1][$2] }
    END {
        for ( val in vals_locs ) {
            out = length(vals_locs[val]) " " val ": "
            sep = ""
            for ( loc in vals_locs[val] ) {
                out = out sep loc
                sep = ", "
            }
            print out
        }
    }
'

$ ./tst.sh
1 hello: microsoft
1 this: meta
2 my: google, amazon
3 test: google, amazon, meta
0
oguz ismail On

With bash only:

arr=(test,meta my,amazon test,amazon this,meta test,google my,google hello,microsoft)

declare -Ai count
declare -A locations

for pair in "${arr[@]}"; do
  value=${pair%%,*}
  location=${pair#*,}
  count[$value]+=1
  locations[$value]+="$location, "
done

for value in "${!locations[@]}"; do
  printf '%d %s: %s\n' ${count[$value]} "$value" "${locations[$value]%, }"
done
0
dawg On

Here is a Ruby to do that:

#!/bin/bash

arr=(test,meta my,amazon test,amazon this,meta test,google my,google hello,microsoft)

ruby -F, -lane 'BEGIN{ grps=Hash.new{|h,k| h[k]=[]} }
grps[$F[0]] << $F[1]
END{ grps.each{|k,v| puts "#{v.length} #{k}: #{v.join(", ")}"} }
' <(printf '%s\n' "${arr[@]}")

Or a two pass awk:

awk -F, 'FNR==NR{cnt[$1]++; line[$1]=$1 in line ? line[$1] ", " $2 : $2; next}
!($1 in p){printf "%s %s: %s\n", cnt[$1],$1,line[$1]; p[$1]}
' <(printf '%s\n' "${arr[@]}") <(printf '%s\n' "${arr[@]}")

Either prints

3 test: meta, amazon, google
2 my: amazon, google
1 this: meta
1 hello: microsoft