I want to understand this LZSS based algorithm in order to write a compressor (and maybe a better decompressor), I am studying LZ77 and LZSS but there are few lines that I still don't get.
I commented the following code as much as I can, the first comment block is not made by me and explains the header.
//
// ULZ header (20 bytes)
// ---------------------
// 0x0000 - 4 bytes: signature "ULZ" + 0x1A
// 0x0004 - 2 bytes: 0
// 0x0006 - 1 byte : mode? (2/normal LZ, 0/unkown LZ type)
// 0x0007 - 1 byte : nbits for LZ algorithm
// 0x0008 - 4 bytes: offset to decompressed data (aligned to 32 bits)
// 0x000C - 4 bytes: offset to compressed data (aligned to 32 bits)
// 0x0010 - 4 bytes: size of decompressed file
//
// ULZ body (aligned to 32 bits)
// -----------------------------
// 0x0014 - X bytes: flags data
// ?????? - X bytes: decompressed data (pointer in 0x0008)
// ?????? - X bytes: compressed data (pointer in 0x000C)
//
// in:
// source - ULZ buffer (a valid ULZ file with header)
// target - data buffer for the decompressed data
// out:
// 0 - decompression ok
// 1 - decompression failed
int UnULZ(unsigned char *source, unsigned char *target) {
unsigned char *ulz_f, *ulz_d, *ulz_c, *tgt, *end;
unsigned int len, pos, nbits, msk;
unsigned char flags, mask;
if (source[0x06] != 0x02) {
if (_DEBUG_) printf("Mode %02X not supported\n", source[0x06]);
return (1);
}
//the top comment block says "for LZ algorithm"
//this int is used later for shift, how is calculates in the first place?
nbits = source[0x07];
//why divide 0xffff by 16-nbits? is nbits supposed to be at least 16?
msk = 0xFFFF >> (16 - nbits);
//read the first flag byte
ulz_f = source + 0x14;
//points to the first decompressed data
ulz_d = source + *(unsigned int *)(source + 0x08);
//where compressed data starts
ulz_c = source + *(unsigned int *)(source + 0x0C);
//target buffer
tgt = target;
//the format tells us the final size
end = target + *(unsigned int *)(source + 0x10);
for (mask = 0; tgt < end; mask <<= 1) {
/*
Now we check for every flag, if the flag is 1
we have uncompressed data to copy into the target buffer
*/
if (!mask) {
mask = 0x01;
/*
load the next flag data if we finish to scan
the first (or previous flag byte)
*/
flags = *ulz_f++;
};
if (flags & mask) {
//we have a bit that tell us to copy uncompressed
//data into the target buffer
*tgt++ = *ulz_d++;
} else {
/*
Here starts the part that I don't understand,
I know we have to find out which uncompressed
data to copy and how long is the "run" for it.
So we start to copy 2 bytes, one for the run
(here calls len) an one for the offset to copy.
*/
pos = *ulz_c++;
pos |= *ulz_c++ << 8;
//why we have to shift by nbits and add 2?
//how is suppose to help the compression?
len = (pos >> nbits) + 2;
//why this bitwise + 1?
pos = (pos & msk) + 1;
//copy the data to the target buffer using our
//decompressed data
while (len--){
*tgt++ = *(tgt - pos);
}
}
}
return (0);
}
To recap, why the original author uses the nbits value in this way?
msk = 0xFFFF >> (16 - nbits);
Why uses those constants and shifts by nbits?
pos = *ulz_c++;
pos |= *ulz_c++ << 8;
len = (pos >> nbits) + 2;
pos = (pos & msk) + 1;
Thanks