Issues with LZW algorithm variable-length decoding procedure

1.1k views Asked by At

The setup

Say I've got:

  • A series of numbers resulting from LZW compression of a bitmap:

    256 1 258 258 0 261 261 259 260 262 0 264 1 266 267 258 2 273 2 262 259 274 275 270 278 259 262 281 265 276 264 270 268 288 264 257
    
  • An LZW-compressed, variable-length-encoded bytestream (including the LZW code size header and sub-block markers) which represents this same series of numbers:

    00001000 00101001 00000000 00000011 00001000 00010100 00001000 10100000 
    01100000 11000001 10000001 00000100 00001101 00000010 01000000 00011000 
    01000000 11100001 01000010 10000001 00000010 00100010 00001010 00110000 
    00111000 01010000 11100010 01000100 10000111 00010110 00000111 00011010 
    11001100 10011000 10010000 00100010 01000010 10000111 00001100 01000001 
    00100010 00001100 00001000 00000000
    
  • And an initial code width of 8.

The problem

I'm trying to derive the initial series of numbers (the integer array) from the bytestream.

From what I've read, the procedure here is to take the initial code width, scan right-to-left, reading initial code width + 1 bits at a time, to extract the integers from the bytestream. For example:

iteration #1:   1001011011100/001/    yield return 4
iteration #2:   1001011011/100/001    yield return 1
iteration #3:   1001011/011/100001    yield return 6
iteration #4:   1001/011/011100001    yield return 6

This procedure will not work for iteration #5, which will yield 1:

iteration #5:   1/001/011011100001    yield return 1 (expected 9)

The code width should have been increased by one.

The question

How am I supposed to know when to increase the code width when reading the variable-length-encoded bytestream? Do I have all of the required information necessary to decompress this bytestream? Am I conceptually missing something?

UPDATE:

After a long discussion with greybeard - I found out that I was reading the binary string incorrectly: 00000000 00000011 00 is to be interpreted as 256, 1. The bytestream is not read as big-endian.

And very roughly speaking, if you are decoding a bytestream, you increase the number of bits read every time you read 2^N-1 codes, where N is the current code width.

1

There are 1 answers

3
greybeard On BEST ANSWER

Decompressing, you are supposed to build a dictionary in much the same way as the compressor. You know you need to increase the code width as soon as the compressor might use a code too wide for the current width.

As long as the dictionary is not full (the maximum code is not assigned), a new code is assigned for every (regular) code put out (not the Clear Code or End Of Information codes).

With the example in the presentation you linked, 8 is assigned when the second 6 is "transmitted" - you need to switch to four bits before reading the next code.

(This is where the example and your series of numbers differ - the link presents 4, 1, 6, 6, 2, 9.)