How to count the unique values in 2 fields without concatenating them?

2.5k views Asked by At

I am trying to using the basic shell in unix to count the unique values in 2 fields. I have data with 5 columns but just what to count the unique values in the first 2 WITHOUT concatenating them. So far I am successful in using cut -f 1 | sort | uniq | wc -l in counting the unique values in column one and I can do the same for column two but because some of the values are the same in column one and two I need to be able to do this command treating column 1 and 2 as one field. Can anyone help me please?

1

There are 1 answers

1
rr- On

Your question can be interpreted in two ways so I answer both of them.

Given the input file:

2       1
2       1
1       2
1       1
2       2

If you want the result to output 4 because the unique pairs are 1 1, 1 2, 2 1 and 2 2, then you need:

cat test|cut -f1,2|sort|uniq|wc -l

What we do here: we pick only first two columns as well as the delimiter and pass it to sort|uniq which does the job.

If you, on the other hand, want the result to output 2 because there are only two unique elements: 1 and 2, then you can tweak the above like this:

cat test|cut -f1,2|tr "\t" "\n"|sort|uniq|wc -l

This time after we select first two columns, we split each of them into two lines so that sort|uniq picks them up.

These work as long as the columns are separated with TAB character, not spaces. Since you didn't pass -d option to cut in your question and cut uses tabs by default, I assumed your input uses tabs too.