Correct way to unpack a 32 bit vector in Perl to read a uint32 written in C

1.2k views Asked by At

I am parsing a Photoshop raw, 16 bit/channel, RGB file in C and trying to keep a log of exceptional data points. I need a very fast C analysis of up to 36 MPix images with 16 bit quanta or 216 MB Photoshop .RAW files.

<1% of the points have weird skin tones and I want to graph them with PerlMagick or Perl GD to see where they are coming from.

The first 4 bytes of the C data file contain the unsigned image width as a uint32_t. In Perl, I read the whole file in binary mode and extract the first 32 bits:

Xres=1779105792l = 0x6a0b0000

It looks a lot like the C log file:

DA: Color anomalies=14177=0.229%:
DA: II=1) raw PIDX=0x10000b25,  XCols=[0]=0x00000b6a

Dec(0x00000b6a) = 2922, the Exact X_Columns_Width of a small test file.

Clearly a case of intel's 1972 8008 NUXI architecture. How hard could it possibly be to translate 0x6a0b0000 to 0x6a0b0000; swap 2 bytes and 2 nibbles and you're done. Slicing the 8 characters and rearranging them could be done but that is the kind of ugly hack I am trying to avoid.

Grab the same 32 bit vector from file offset zero and unpack it as "VAX" unsigned long.

$xres = vec($bdat, 0, 32);  # vec EXPR,OFFSET,BITS
$vul   = unpack("V", vec($bdat, 0, 32));
printf("Length (\$bdat)=%d, xres=0x%08x, Vax ulong=%ul=0x%08x\n",
    length($bdat), $xres, $vul, $vul);
Length ($bdat) = 56712, xres=0x6a0b0000, Vax ulong=959919921l=0x39373731

Every single hex character is mangled. Obviously wrong Endian, it is not VAX. The "Other" one is Network Big-endian

http://perldoc.perl.org/functions/pack.html
N  An unsigned long (32-bit) in "network" (big-endian) order.
V  An unsigned long (32-bit) in "VAX" (little-endian) order.
$nul = unpack("N", vec($bdat, 0, 32));  # Network Unsigned Long 32b
printf("Xres=0x%08x, NET ulong=%ul=0x%08x\n", $xres, $nul, $nul);
Xres=0x6a0b0000, NET ulong=825702201l=0x31373739

The $XRES still shows the right hex in the wrong order. The "NETWORK" long 32 bit uint extracted from the same bits is unrecognizable. Try Binary

$bits = unpack("b*", vec($bdat, 0, 32));
printf("bits=$bits, len=%d\n", length $bits);
   bits=10001100111011001110110010011100100011000000110010101100111011001001110001001100, len=80

I clearly asked for 32 bits and got 80 bits. What gives?

Try for 4, unsigned, 8bit bytes which can NOT be swapped:

for($ii = 0; $ii < 4; $ii++)  {
    $bit_off=$ii*8;  # Bit offset
    $uc = unpack("C", vec($bdat, $bit_off, 8));  # C  An unsigned char 
    printf("II $ii, bo $bit_off, d=%d, u=%u, x=0x%x\n", 
       $uc,$uc, $uc);
}
II 0, bo 0, d=49, u=49, x=0x31
II 1, bo 8, d=51, u=51, x=0x33
II 2, bo 16, d=49, u=49, x=0x31
II 3, bo 24, d=49, u=49, x=0x31

I am looking for hex 0, 6, a or b. There are no "3"s or "1"s in the right answer. Try pirating from a C file:

http://cpansearch.perl.org/src/MHX/Convert-Binary-C-0.76/tests/include/include/bits/byteswap.h
$x = $xres;
$x= (((($x) & 0xff000000) >> 24) | ((($x) & 0x00ff0000) >>  8) |     ((($x) & 0x0000ff00) <<  8) | ((($x) & 0x000000ff) << 24));
printf("\$xres=0x%08x -> \$x=0x%08x = %u\n", $xres, $x, $x);
$xres=0x6a0b0000 -> $x=0x00000b6a = 2922

It WORKS! But, this is uglier than converting the original, wrong order hex number to a string to untangle it:

$stupid_str = sprintf("%08x", $xres);
$stupid_num = join('', reverse ($stupid_str =~ m/../g));
printf("Stupid_num '%s'->0x%08x=%d\n", $stupid_num, $dec=hex $stupid_num, $dec);
Stupid_num '00000b6a'->0x00000b6a=2922

It's like judging the Ugliest Dog contest, but I would still rather have to maintain the text version than the even more abominable C version.

I know there are ways to do this in Java/Python/Go/Ruby/.....

I know there are command line utilities that do exactly this.

I must figure out how I am misusing either VEC or Unpack, both of which I have used a zillion times. It is the Brain Teasing aspect which is driving me nuts! EndianNess == EndianMess!!!

TYVM!

=================================================

Borodin,

Thanks for lookin' at this.

My intel processor is little-endian. When I read it back, it was trans-mutilated by vec to the "correct" big-endian, network format.

I just tried reading it VERBATIM from a BINARY file read and it works fine:

($b4 = $bdat) =~ s/^(....).*$/$1/msg;   # Give me my 4 bytes back without mutilation!
printf("B4='%s'=>0x%08x=<0x%08x\n", $b4, unpack("L>", $b4), unpack("L<", $b4));
B4='j...' = >0x6a0b0000 = <0x00000b6a   <<<  THE RIGHT ANSWER!!!

If you try unpack 'V', $bdat then you will find that it works

That was my first attempt: $vul = unpack("V", vec($bdat, 0, 32)); # UNPACK V!
printf("Length (\$bdat)=%d, xres=0x%08x, Vax ulong=%ul=0x%08x\n", length($bdat), $xres, $vul, $vul); Length ($bdat) = 56712, xres=0x6a0b0000, Vax ulong=959919921l=0x39373731 <<<< TOTALLY WRONG!

I had already verified that the $BDAT info was the right data in the wrong format. It just needed some rearrangement.

I just used vec() to generate 1 bit and 4 bit graphics files and it worked faithfully, returning the exact bits I wrote. It must have mistaken my Intel i7 for my IBM System/370. I7/37??? Easy mistake to make. :)

I read the [confusing] part about "converted to a number as with pack ...". That's why my number was backward. The >>unpack("V", vec($bdat"<< ... was my ill-fated attempt to byte-swap the backward number in $BDAT from the WRONG VEC()-preferred FORMAT to the native format supported by my architecture.

Now I understand why I saw so many examples of people extracting by the byte, to avoid Big Brother's helping hand!

Data::BitStream::Vec "uses a Perl vec to store the data. The vector is accessed in 1-bit units"

Thanks 1E6,

B
1

There are 1 answers

7
Borodin On

You are confusing things by combining vec with unpack

The correct way is simply

unpack 'V', $bdat

which returns a value of 0x00000B6A as you expect

vec($bdat, 0, 32) is equivalent to unpack 'N', $bdat as you can see from the value of $xres in your first code block, and the documentation for vec confirms this with

If BITS is 16 or more, bytes of the input string are grouped into chunks of size BITS/8, and each group is converted to a number as with pack()/unpack() with big-endian formats n/N

The line

$vul = unpack("V", vec($bdat, 0, 32))

is very wrong, because the decimal value of vec($bdat, 0, 32) is 1779105792, so you are then calling unpack on the string "1779105792" which doesn't do anything useful at all