Dedup multi line records with perl

Question

Dedup multi line records with perl

156 views Asked by Bubnoff At 21 November 2014 at 03:27

I have multi-line records in a text file I'd like to dedupe using perl:

Records are delimited by "#end-of-record" string and look like this:

CAPTAIN GIBLET'S NEWT CORRAL
555 RANDOM ST
TARDIS, CT 99999

We regret to inform you that we must repossess your pants in part due to your being 6 months late on payments. But mostly it's maliciousness. :)

TOTAL DUE: $30.00

#end-of-record

Here is my initial attempt:

    #!/usr/bin/perl -w

    use strict;

    {
            local $/ = "#end-of-record";

            my %seen;
            while ( my $record = <> ) {

                    if (not exists $seen{$record}) {
                            print $record;
                            $seen{$record} = 1;
                    }
            }

    }

This is printing out every record ...and duplicate records. Where did I go wrong?

UPDATE
Above code seems to work.

Original Q&A

There are 1 answers

**Kaz** · Answer 1 · 2015-12-15T20:38:36+00:00

Kaz On 15 December 2015 at 20:38

gawk 'BEGIN {ORS = RS = "#end-of-record\n"} !$seen[$0]++
      END { print $ORS }' yourfile

TechQA.

Dedup multi line records with perl

There are 1 answers

Related Questions in PERL

Related Questions in DEDUPLICATION

Popular Questions

Popular Tags

Trending Questions