Dedup multi line records with perl

164 views Asked by At

I have multi-line records in a text file I'd like to dedupe using perl:

Records are delimited by "#end-of-record" string and look like this:

CAPTAIN GIBLET'S NEWT CORRAL
555 RANDOM ST
TARDIS, CT 99999

We regret to inform you that we must repossess your pants in part due to your being 6 months late on payments. But mostly it's maliciousness. :)

TOTAL DUE: $30.00

#end-of-record

Here is my initial attempt:

    #!/usr/bin/perl -w

    use strict;

    {
            local $/ = "#end-of-record";

            my %seen;
            while ( my $record = <> ) {

                    if (not exists $seen{$record}) {
                            print $record;
                            $seen{$record} = 1;
                    }
            }

    }

This is printing out every record ...and duplicate records. Where did I go wrong?

UPDATE
Above code seems to work.

1

There are 1 answers

0
Kaz On
gawk 'BEGIN {ORS = RS = "#end-of-record\n"} !$seen[$0]++
      END { print $ORS }' yourfile