Bash: get value of one column based on value of another column

Question

Bash: get value of one column based on value of another column

1.6k views Asked by StatsSorceress At 16 March 2017 at 17:03

I have a space-separated file that contains:

5.75e-01 7.00e-1 5.52e-01 7.33e-01 ./dir1/dir2/file1.csv
5.75e-01 7.00e-1 5.42e-01 7.34e-01 ./dir1/dir2/file2.csv
5.75e-01 7.00e-1 5.72e-01 7.43e-01 ./dir2/dir2/file1.csv
5.75e-01 7.00e-1 5.22e-01 7.23e-01 ./dir2/dir2/file2.csv
5.75e-01 7.00e-1 5.02e-01 7.93e-01 ./dir3/dir2/file1.csv
5.75e-01 7.00e-1 5.12e-01 7.63e-01 ./dir3/dir2/file2.csv

I would like to extract the value of column 5 that corresponds to the maximum of column 3 for each value of dir#. For example, let's say I'm talking about dir1. That corresponds to these rows:

5.75e-01 7.00e-1 5.52e-01 7.33e-01 ./dir1/dir2/file1.csv
5.75e-01 7.00e-1 5.42e-01 7.34e-01 ./dir1/dir2/file2.csv

and I can find these using:

max_val_acc_=$(awk '$5 ~ /dir1/ { print }' filename.txt)
echo $max_val_acc

Now I think I need to pipe this result through a sort and take the head, but I can't get it working. The result I'm looking for (for dir1) is:

./dir1/dir2/file1.csv

and the complete result for all dir#:

./dir1/dir2/file1.csv
./dir2/dir2/file1.csv
./dir3/dir2/file2.csv

Original Q&A

There are 5 answers

**James Brown** · Answer 1 · 2017-03-16T17:19:33+00:00

I'm not sure I understood you right but this is in awk how I understood you:

awk -v s="dir1" '         # search parameter in your s
index($5,"./" s "/") {    # if your s is found in $5
    if(max==""||$3>max){  # we initialize $3 or compare to previous max
        max=$3;           # store new max
        maxv=$5           # and new mac value 
    }
}
END{ print maxv }         # print the stored max value
' file                    # oh just the file
./dir1/dir2/file1.csv

**Ed Morton** · Answer 2 · 2017-03-16T17:29:07+00:00

Is this what you're looking for?

$ cat tst.awk
{
    split($5,path,"/")
    dir = path[2]
    if ( !(dir in max) || ($3 > max[dir]) ) {
        max[dir] = $3
        val[dir] = $5
    }
}
END {
    for (dir in val) {
        print val[dir]
    }
}

$ awk -f tst.awk file
./dir3/dir2/file2.csv
./dir1/dir2/file1.csv
./dir2/dir2/file1.csv

**karakfa** · Answer 3 · 2017-03-16T18:52:42+00:00

another alternative with sort and awk

$ sort -k5 -k3,3r file | awk -F/ '!a[$NF]++'

5.75e-01 7.00e-1 5.52e-01 7.33e-01 ./dir1/dir2/file1.csv
5.75e-01 7.00e-1 5.42e-01 7.34e-01 ./dir1/dir2/file2.csv

above is for filenames, if based on first dir names

$ sort -k3,3r file | awk '{split($NF,d,"/")} !a[d[2]]++'

5.75e-01 7.00e-1 5.72e-01 7.43e-01 ./dir2/dir2/file1.csv
5.75e-01 7.00e-1 5.52e-01 7.33e-01 ./dir1/dir2/file1.csv
5.75e-01 7.00e-1 5.12e-01 7.63e-01 ./dir3/dir2/file2.csv

and if you want to print the dirs only

$ sort -k3,3r file | awk '{split($NF,d,"/")} !a[d[2]]++{print $NF}'

./dir2/dir2/file1.csv
./dir1/dir2/file1.csv
./dir3/dir2/file2.csv

**clt60** · Answer 4 · 2017-03-16T21:15:50+00:00

Just for fun - without any (awk, perl like) programming language

file="./data.txt"
paste -d ' ' "$file" <(cut -d/ -f2 "$file") |\
    LC_ALL=C sort -k6 -k3gr | uniq -f5 | cut -d' ' -f5

output

./dir1/dir2/file1.csv
./dir2/dir2/file1.csv
./dir3/dir2/file2.csv

Use some awk solution. As i said, this is only for showing another way.

**gregory** · Answer 5 · 2017-03-16T21:35:11+00:00

While not matching your requested output and similar to the other answers already posted, I find this awk command more memorable:

< file | sort -k3,3r | awk -F "/" '!seen[$2]++'

output:

5.75e-01 7.00e-1 5.72e-01 7.43e-01 ./dir2/dir2/file1.csv
5.75e-01 7.00e-1 5.52e-01 7.33e-01 ./dir1/dir2/file1.csv
5.75e-01 7.00e-1 5.12e-01 7.63e-01 ./dir3/dir2/file2.csv

The more general point being: sort the entire list by size (column 3) and don't worry about sorting it by directory name (part of column 5), then extract the first of each directory name (i.e. awk prints only newly seen key name).

If you really want the output to just the directories names and sorted, then add the following to the pipe chain:

| cut -d ' ' -f5- | sort

TechQA.

Bash: get value of one column based on value of another column

There are 5 answers

Related Questions in BASH

Related Questions in SORTING

Related Questions in AWK

Related Questions in UNIX-HEAD

Popular Questions

Trending Questions