perl sort substrings using array to determine collating sequence

105 views Asked by At

I have a list of strings of the form:

CLUB1_20201008_EVE
CLUB1_20201008_AFT
CLUB1_20201008_AM
CLUB1_20201008_AM2
CLUB1_20201008_PM
CLUB1_20201008_NIGHT
CLUB2_20201008_EVE
CLUB2_20201008_AFT
CLUB2_20201008_AM
CLUB2_20201008_AM2
CLUB2_20201008_PM
CLUB2_20201008_NIGHT

I can get the sort to work by club name and date but the time of day needs to be sorted like AM,AM2,AFT,PM,EVE,NIGHT. These are obviously unsortable since their sequence is not alphabetical.

How do I use an auxiliary array to get them to sort in the proper order?

Thanks,Milt

Expected Output CLUB1_20201008_NIGHT CLUB2_20201008_NIGHT CLUB1_20201008_EVE CLUB2_20201008_EVE CLUB1_20201008_PM CLUB2_20201008_PM CLUB1_20201008_AFT CLUB2_20201008_AFT CLUB1_20201008_AM2 CLUB2_20201008_AM2 CLUB1_20201008_AM CLUB2_20201008_AM

3

There are 3 answers

1
Shawn On

One way is to use a hash table that maps the strings to their numerical order, and then comparing based on that:

#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say state/;

my @clubs = qw/CLUB1_20201008_EVE CLUB1_20201008_AFT CLUB1_20201008_AM
CLUB1_20201008_AM2 CLUB1_20201008_PM CLUB1_20201008_NIGHT
CLUB2_20201008_EVE CLUB2_20201008_AFT CLUB2_20201008_AM
CLUB1_20201008_AM2 CLUB1_20201008_PM CLUB1_20201008_NIGHT/;

sub sort_clubs {
    state $mappings = { AM => 1, AM2 => 2, AFT => 3, PM => 4,
                        EVE => 5, NIGHT => 6 };

    # For use with a Schwartzian Transform of the original data.
    # Expected format of arguments: [ [ club name, date, time ], original string ]
    my $cmp = $a->[0][0] cmp $b->[0][0];
    if ($cmp != 0) {
        return $cmp;
    }
    $cmp = $a->[0][1] <=> $b->[0][1];
    if ($cmp != 0) {
        return $cmp;
    } else {
        return $mappings->{$a->[0][2]} <=> $mappings->{$b->[0][2]}
    }
}

@clubs = map { $_->[1] } sort sort_clubs map { [ [ split(/_/, $_) ], $_ ] } @clubs;
say "@clubs";

Wikipedia has information on the Schwartzian Transform if you haven't seen that idiom for sorting before.

0
Polar Bear On

There are many possible ways to achieve desired result.

Note: code assumed based that you do not try to sort on club, date_hour_min and day_time at once as it not declared in your question. A sample of desired sorted output would be helpful for clearance of your problem -- we can not read what is on your mind.

Please research following two possible approaches.

You declared sort order based on time day index in the strings -- lets put it to use in some way. I will use AM,AM2,AFT,PM,EVE,NIGHT as initialization string for simplicity.

First approach utilizes %order hash with time day be a key and digit representing numerical order. Strings stored in HoA with numerical order as a key. Once you filled the hash just print according numerical key order with lines preserving order of their appearance on input.

use strict;
use warnings;
use feature 'say';

my $count = 0;
my @set   = split ',', 'AM,AM2,AFT,PM,EVE,NIGHT';
my %order = map { $_ => $count++ } @set;
my %result;

while( <DATA> ) {
    chomp;
    for my $k ( keys %order ) {
        push @{$result{$order{$k}}}, $_ if /_$k\z/;
    }
}

for( sort {$a <=> $b} keys %result ) {
    say for @{ $result{$_} };
}


__DATA__
CLUB1_20201008_EVE
CLUB1_20201008_AFT
CLUB1_20201008_AM
CLUB1_20201008_AM2
CLUB1_20201008_PM
CLUB1_20201008_NIGHT
CLUB2_20201008_EVE
CLUB2_20201008_AFT
CLUB2_20201008_AM
CLUB1_20201008_AM2
CLUB1_20201008_PM
CLUB1_20201008_NIGHT

Output

CLUB1_20201008_AM
CLUB2_20201008_AM
CLUB1_20201008_AM2
CLUB1_20201008_AM2
CLUB1_20201008_AFT
CLUB2_20201008_AFT
CLUB1_20201008_PM
CLUB1_20201008_PM
CLUB1_20201008_EVE
CLUB2_20201008_EVE
CLUB1_20201008_NIGHT
CLUB1_20201008_NIGHT

Second approach is even more simple. Push the lines in HoA %result based on time day index (at the end of the line). Then print print HoA according predefined $order array.

use strict;
use warnings;
use feature 'say';

my @order = split ',', 'AM,AM2,AFT,PM,EVE,NIGHT';
my %result;

while( <DATA> ) {
    chomp;
    push @{$result{$1}}, $_ if /_([^_]+)\z/;
}

for( @order ) {
    say for @{ $result{$_} };
}


__DATA__
CLUB1_20201008_EVE
CLUB1_20201008_AFT
CLUB1_20201008_AM
CLUB1_20201008_AM2
CLUB1_20201008_PM
CLUB1_20201008_NIGHT
CLUB2_20201008_EVE
CLUB2_20201008_AFT
CLUB2_20201008_AM
CLUB1_20201008_AM2
CLUB1_20201008_PM
CLUB1_20201008_NIGHT

Output

CLUB1_20201008_AM
CLUB2_20201008_AM
CLUB1_20201008_AM2
CLUB1_20201008_AM2
CLUB1_20201008_AFT
CLUB2_20201008_AFT
CLUB1_20201008_PM
CLUB1_20201008_PM
CLUB1_20201008_EVE
CLUB2_20201008_EVE
CLUB1_20201008_NIGHT
CLUB1_20201008_NIGHT
0
ikegami On

The trick is to create a map between each time-of-day string and a number you can sort.

my @shifts = qw( AM AM2 AFT PM EVE NIGHT );
my %shift_order = map { $shifts[$_] => $_ } 0..$#shifts;

Solutions based on Sort-Key will be among the cleanest and fastest.

use Sort::Key::Multi qw( ssukeysort );

my @shifts = qw( AM AM2 AFT PM EVE NIGHT );
my %shift_order = map { $shifts[$_] => $_ } 0..$#shifts;

my @sorted =
   ssukeysort {
      my ($club, $date, $shift) = split /_/;
      ( $club, $date, $shift_order{$shift} )
   }
      @unsorted;

Simple solution without modules:

my @shifts = qw( AM AM2 AFT PM EVE NIGHT );
my %shift_order = map { $shifts[$_] => $_ } 0..$#shifts;

my @sorted =
   sort {
      my ($club_a, $date_a, $shift_a) = split /_/, $a;
      my ($club_b, $date_b, $shift_b) = split /_/, $b;

      $club_a cmp $club_b
         or
      $date_a cmp $date_b
         or
      $shift_order{$shift_a} <=> $shift_order{$shift_b}
   }
      @unsorted;

Performant solution without modules:

my @shifts = qw( AM AM2 AFT PM EVE NIGHT );
my %shift_order = map { $shifts[$_] => $_ } 0..$#shifts;

my @sorted =
   map { ( split /\0/ )[-1] }
      sort
         map {
            my ($club, $date, $shift) = split /_/;
            join("\0", $club, $date, $shift_order{$shift}, $_)
         }
            @unsorted;