What is going on with the last two lext[] values in zlib?

33 views Asked by At

Starting with the very first historical version committed to GitHub (zlib 0.71) we have the following code (https://github.com/madler/zlib/blob/bcf78a20978d76f64b7cd46d1a4d7a79a578c77b/inftrees.c#L42):

local uInt cplext[] = { /* Extra bits for literal codes 257..285 */
        0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2,
        3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 0, 128, 128}; /* 128==invalid */

This array determines the number of extra bits that need to be read to encode/decode a length code. (Despite the comment, all codes 257-285 are length codes, not literal codes.)

What is interesting here is that there are two extra entries to pad this array out to size 32, despite the fact that (as the comment says) they are invalid: 285 is the final code.

What is more interesting is this revision in zlib 0.93 (https://github.com/madler/zlib/commit/6b834a58bdef976383cff6e2a83f353e668a9cf1)

 local uInt cplext[] = { /* Extra bits for literal codes 257..285 */
         0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2,
-        3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 0, 128, 128}; /* 128==invalid */
+        3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 0, 192, 192}; /* 192==invalid */

But even more interesting is zlib 1.05 (https://github.com/madler/zlib/commit/ebd3c2c0e734fc99a1360014ea52ed04fe6aade4#diff-742597d9ad7fae5292bb0302106fd3d610970279505adbc62425c26a1873600c)

 local const uInt cplext[31] = { /* Extra bits for literal codes 257..285 */
         0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2,
-        3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 0, 192, 192}; /* 192==invalid */
+        3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 0, 112, 112}; /* 112==invalid */

128 and 192 made sense as sentinel values, but 112 seems... arbitrary.

But all of this is just a warm-up for zlib 1.2.0, where inftrees.c was largely rewritten into its modern form and we start getting the really interesting behaviour: (https://github.com/madler/zlib/blob/7c2a874e50b871d04fbd19501f7b42cff55e5abc/inftrees.c#L63)

   static const unsigned short lext[31] = { /* Length codes 257..285 extra */
        16, 16, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 18, 18, 18, 18,
        19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 21, 21, 16, 73, 194};

These are now all +16 their old values, surely for algorithmic optimization reasons I'm not bothering to look into, but the two junk entries are still there, and they seem... random.

With the next release, the junk numbers are changed again... to different random-looking values:

     static const unsigned short lext[31] = { /* Length codes 257..285 extra */
         16, 16, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 18, 18, 18, 18,
-        19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 21, 21, 16, 73, 194};
+        19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 21, 21, 16, 205, 64};

This pattern continues in every single zlib release. These two bytes are changed to random-looking values, for no purpose I can discern. My question is: Why? Even as recently as two months ago, this was done in a commit that otherwise changes nothing but version numbers, and it was done consistently in two different places: https://github.com/madler/zlib/commit/9f0f2d4f9f1f28be7e16d8bf3b4e9d4ada70aa9f

It could be to build-stamp the library, but there are strings (including the version string) that already do that. These are invalid entries, so they should never have any effect on the output. I can't think of a purpose to constantly changing this array.

0

There are 0 answers