Sum rows of different columns by match in the second column

Question

Sum rows of different columns by match in the second column

88 views Asked by Cristbal Alejandro Hernndez lv At 12 February 2018 at 17:45

I have a file like this:

ID  Category Sample1 Sample2 Sample3
1   A        5       5       5
2   A        5       5       5
3   A        5       5       5
4   B        1       2       3
5   B        1       2       3

And I'm looking for an awk, sed or similar solution to achieve this:

ID  Category Sample1 Sample2 Sample3
1   A        15      15      15
4   B        2       4       6

The idea is to sum rows of matching categories, considering the values of each sample column and remove the repeated IDs.

Original Q&A

There are 1 answers

**7stud** · Answer 1 · 2018-02-12T19:44:03+00:00

data1.txt:

ID  Category Sample1 Sample2 Sample3
1   A        5       5       5
2   A        5       5       5
3   A        5       5       5
4   B        1       2       3
5   B        1       2       3

Code:

use strict;
use warnings; 
use 5.020;
use autodie;
use Data::Dumper;
use List::MoreUtils qw{ uniq };

open my $INFILE, '<', 'data1.txt';

my %results;
my @cat_order;

#Get names of column headers:
my ($h1, $h2, @sample_names) =  split ' ', <$INFILE>;

while (my $line = <$INFILE>) {
    my($id, $cat, @samples)= split ' ', $line;
    push @cat_order, $cat;

    push @{ $results{$cat}{$h1} }, $id;  #e.g. push results{A}{ID}, 1 

    while ( my($i, $sample) = each @samples ) {
        $results{$cat}{ $sample_names[$i] } += $sample;  #e.g. results{A}{Sample1} += 5   
    }
}

close $INFILE;
open my $OUTFILE, '>', 'results.txt';

my $format = "%-3s %-9s %-9s %-9s %-9s\n";
printf {$OUTFILE} $format, $h1, $h2, @sample_names;

for my $cat (uniq(@cat_order)) {
    printf( $OUTFILE 
        $format, 
        $results{$cat}{$h1}[0],  #e.g. results{A}{ID}[0], which is the first id in the ID array
        $cat,  #e.g. A
        @{ $results{$cat} }{@sample_names}  #e.g. results{A}{'Sample1', 'Sample2', 'Sample3'}  -- a hash slice,
                                            #which returns an array of the values matching those keys.
    )
}

close $OUTFILE;

Output:

$ rm results.txt
remove results.txt? y

$ perl 1.pl 

$ cat results.txt 
ID  Category  Sample1   Sample2   Sample3  
1   A         15        15        15       
4   B         2         4         6

And with data1.txt:

ID  Category Sample1 Sample2 Sample3
1   A        5       5       5
2   B        1       2       3
3   A        5       5       5
4   C        10      11      12
5   B        1       2       3
6   A        5       5       5
7   C        1       1       1

Output:

$ rm results.txt
remove results.txt? y

$ perl 1.pl 
$ cat results.txt 
ID  Category  Sample1   Sample2   Sample3  
1   A         15        15        15       
2   B         2         4         6        
4   C         11        12        13

%results:

$VAR1 = {
          'C' => {
                   'ID' => [
                             '4',
                             '7'
                           ],
                   'Sample1' => 11,
                   'Sample3' => 13,
                   'Sample2' => 12
                 },
          'B' => {
                   'ID' => [
                             '2',
                             '5'
                           ],
                   'Sample1' => 2,
                   'Sample3' => 6,
                   'Sample2' => 4
                 },
          'A' => {
                   'Sample1' => 15,
                   'ID' => [
                             '1',
                             '3',
                             '6'
                           ],
                   'Sample2' => 15,
                   'Sample3' => 15
                 }
        };

TechQA.

Sum rows of different columns by match in the second column

There are 1 answers

Related Questions in SHELL

Related Questions in PERL

Related Questions in AWK

Related Questions in SED

Popular Questions

Popular Tags

Trending Questions