Reading binary file in Perl

2k views Asked by At

I have a binary file with a bunch of data blocks stored each after another. The data block is formatted in the following style:

Length [byte]   Content       Description
2               0xFFFF        Data block header
4               Epoch time    seconds since 00:00:00 UTC, January 1, 1970
2               value of N    Length of data following this value
N               Data          Data itself

I tried to use unpack, but this was wrong because of the non-fixed length of data.

I need to write a subroutine which will read and parse data blocks (one data block each time the subroutine is called) until the end of file.

The file is using big-endian.

This is what I tried until now:

use strict;
use warnings;

my $filename;

if (! $ARGV[0])
{
    die "Input filename is required";
}

sub setFile
{
    $filename = $_[0];
}

my $inFile = $ARGV[0];

setFile($inFile);

open INFILE, $filename or die "\nUnable to open input file";

binmode INFILE;

my $nbytes;

while (<INFILE>) {
    my( $header, $timestamp_hex, $datalength_hex ) = unpack 'H4 H8 H4', $_;
    my $timestamp = hex($timestamp_hex);
    my $datalength = hex($datalength_hex);
    print "$timestamp $datalength\n";

    for (my $i = 0; $i < $datalength; $i++)
    {
        my $data = unpack 'H', $_;
        print "$data";
    }
    print "\n";
}

close INFILE
    or die "Error while closing $filename: $!\n";
1

There are 1 answers

0
ikegami On

<INFILE> makes no sense. It reads until a newline is found.

If you have the whole file in memory, you can use the following:

my @fields = unpack('( n N n/a* )*', $file);
while (@fields) {
   my ($sig, $ts, $data) = splice(@fields, 0, 3);
   die "Incorrect signature" if $sig != 0xFFFF;
   process_rec($ts, $data);
}

If we were to extract the header separately from the data, we can save memory and add some error checking.

use constant HEADER_FORMAT => 'nNn';
use constant HEADER_LENGTH => length(pack(HEADER_FORMAT, 0, 0, 0));

while (length($file)) {
   last if length($file) < HEADER_LENGTH;
   my ($sig, $ts, $data_len) = unpack(HEADER_FORMAT, substr($file, 0, HEADER_LENGTH, ''));
   die "Incorrect signature" if $sig != 0xFFFF;
   last  if length($file) < $data_len;
   process_rec($ts, substr($file, 0, $data_len, ''));
}

die "Premature EOF" if length($file);

Reading from a file handle is an extension of the second snippet. If you don't have the whole file in memory, you can use the following:

use constant HEADER_FORMAT => 'nNn';
use constant HEADER_LENGTH => length(pack(HEADER_FORMAT, 0, 0, 0));
use constant BLOCK_SIZE    => 128*1024;

sub make_fill_to = sub {
   my $fh      =  shift;
   my $buf_ref = \shift;
   my $eof = 0;

   return sub {
      my $bytes_needed = $_[1];
      while (!$eof && length($$buf_ref) < $bytes_needed) {
         my $rv = sysread($fh, $$buf_ref, BLOCK_SIZE, length($$buf_ref));
         die $! if !defined($rv);
         $eof = 1 if !$rv;
      }

      return !$eof;
   }
};

my $buf = '';
my $fill_to = make_fill_to($fh, $buf);
while (1) {
   $fill_to->(HEADER_LENGTH)
      or last LOOP;
   my ($sig, $ts, $data_len) = unpack(HEADER_FORMAT, substr($buf, 0, HEADER_LENGTH, ''));
   die "Incorrect signature" if $sig != 0xFFFF;
   $fill_to->($data_len)
      or last LOOP;
   process_rec($ts, substr($buf, 0, $data_len, ''));
}

die "Premature EOF" if length($buf);

When using select to manage multiple handles, the read must come first, so I'm used to writing it that way. The following is what the above would look like if it was refactored to put the read first:

use constant HEADER_FORMAT => 'nNn';
use constant HEADER_LENGTH => length(pack(HEADER_FORMAT, 0, 0, 0));
use constant BLOCK_SIZE    => 128*1024;

my $buf = '';
my ($got_header, $sig, $ts, $data_len);
while (1) {
   my $rv = sysread($fh, $buf, BLOCK_SIZE, length($buf));
   die $! if !defined($rv);
   last if !$rv;

   while (1) {
      if (!$got_header) {
         last if length($buf) < HEADER_LENGTH;          
         ($sig, $ts, $data_len) = unpack(HEADER_FORMAT, substr($buf, 0, HEADER_LENGTH, ''));
         die "Incorrect signature" if $sig != 0xFFFF;
         $got_header = 1;
      }

      last if length($buf) < $data_len;
      process_rec($ts, substr($buf, 0, $data_len, ''));
      $got_header = 0;
   }
}

die "Premature EOF" if $got_buffer || length($buf);