get nbest key-value pairs hash table in Perl

122 views Asked by At

I have this script that use a hash table:

#!/usr/bin/env perl

use strict; use warnings;

my $hash = {
      'cat' => {
               "félin" => '0.500000',
               'chat' => '0.600000',
               'chatterie' => '0.300000'
               'chien' => '0.01000'
             },
      'rabbit' => {
                  'lapin' => '0.600000'                     
                },
      'canteen' => {
                   "ménagère" => '0.400000',
                   'cantine' => '0.600000'
                 }
       };

my $text = "I love my cat and my rabbit canteen !\n";

foreach my $word (split "\s+", $text) {
    print $word;
    exists $hash->{$word}
        and print "[" . join(";", keys %{ $hash->{$word} }) . "]";
    print " ";
}

For now, I have this output:

I love my cat[chat;félin;chatterie;chien] and my rabbit[lapin] canteen[cantine;ménagère] !

I need to have the nbest key value according to the frequencies (stored in my hash). For example, I want to have the 3 best translations according to the frequencies like this:

I love my cat[chat;félin;chatterie] and my rabbit[lapin] canteen[cantine;ménagère] !

How can I change my code to take into account the frequencies of each values and also to print the nbest values ?

Thanks for your help.

1

There are 1 answers

2
Borodin On

The tidiest way to do this is to write a subroutine that returns the N most frequent translations for a given word. I have written best_n in the program below to do that. It uses rev_nsort_by from List::UtilsBy to do the sort succinctly. It isn't a core module, and so may well need to be installed.

I have also used an executable substitution to modify the string in-place.

use utf8;
use strict;
use warnings;

use List::UtilsBy qw/ rev_nsort_by /;

my $hash = {
  'cat'     => {
    'félin'     => '0.500000',
    'chat'      => '0.600000',
    'chatterie' => '0.300000',
    'chien'     => '0.01000',
  },
  'rabbit'  => {
    'lapin'     => '0.600000',
  },
  'canteen' => {
    'ménagère'  => '0.400000',
    'cantine'   => '0.600000',
  }
};

my $text = "I love my cat and my rabbit canteen !\n";

$text =~ s{(\S+)}{
   $hash->{$1} ? sprintf '[%s]', join(';', best_n($1, 3)) : $1;
}ge;

print $text;

sub best_n {
  my ($word, $n) = @_;
  my $item = $hash->{$word};
  my @xlate = rev_nsort_by { $item->{$_} } keys %$item;
  $n = $n > @xlate ? $#xlate : $n - 1;
  @xlate[0..$n];
}

output

I love my [chat;félin;chatterie] and my [lapin] [cantine;ménagère] !