Binary interoperability between gzdecode() and inflate_add()

426 views Asked by At

I wrote a small WebSocket library a while back, and found adding gzip support surprisingly easy. I didn't fully realize at the time that the deflate_init() / deflate_add() / inflate_init() / inflate_add() functions were actually PHP 7-only, and now I'd like to be able to run my WebSocket server under PHP 5 environments.

My problem is, deflate_add() produces output that differs slightly from gzdeflate() - by one character in the testcase below.

The deflate_add()/inflate_add()-based approach works perfectly in-browser, so the output of gzdeflate() is the incorrect one. I'm guessing gzdeflate()/gzinflate() are using zlib with different underlying options - something related to stream state, maybe? - and that's causing everything to fall apart.

Ultimately I want to know if I can convince PHP 5-era zlib functions to output "correct" deflated data.


First of all, the deflate_init()/deflate_add()-based approach I used on PHP 7:

$data = "ABC";

$ctx = deflate_init(ZLIB_ENCODING_RAW);

// unfortunately I can't find the gigantic blog post with example code
// that I learned from :(, but it contained the Ruby equivalent of the
// the substr() below. I blinked at it a bit but apparently this is how
// it's done.
$deflated = substr(deflate_add($ctx, $data, ZLIB_SYNC_FLUSH), 0, -4);

// $deflated is now "rtr\6\0"

$ictx = inflate_init(ZLIB_ENCODING_RAW);

$data2 = inflate_add($ictx, $deflated, ZLIB_NO_FLUSH);

// $data2 is now "ABC"

Here's what happens if I use gzdeflate()/gzinflate():

$data = "ABC";

$deflated = gzdeflate($data, 9, ZLIB_ENCODING_RAW);

// $deflated is now "str\6\0"

$output = gzinflate($deflated);

// $output is now "ABC"

Trying to gzinflate() the output of inflate_add() produces a data error. As a TL;DR:

print gzinflate("rtr\6\0")."\n"; // will bomb out

print gzinflate("str\6\0")."\n"; // prints "ABC"
1

There are 1 answers

0
Mark Adler On

What you are calling correct is incorrect, and what you are calling incorrect is correct.

With deflate_add you are deliberately creating an unterminated, i.e. invalid, deflate stream. Why, I have no idea. (Nor, apparently, do you, since this came from a "gigantic blog post" that you cannot find.) This is being done with the ZLIB_SYNC_FLUSH which completes the current deflate block and appends an empty stored block. The substr(,,-4) is removing most of that empty stored block at the end, leaving you with an incomplete, invalid inflate stream, prematurely ending in the middle of a stored block.

gzdeflate on the other hand is creating a properly terminated deflate stream, with a single deflate block marked as the last block. The only difference between the two streams is the first (least significant) bit, which is a 1 to mark the last block.

You do not say how the properly terminated deflate stream is "causing everything to fall apart". In any case, you can make a properly terminated deflate stream with deflate_add by using ZLIB_FINISH instead of ZLIB_SYNC_FLUSH, and forgoing the substr.

There is no way to make an invalid deflate stream with gzdeflate, if that's what you're asking. You can't just change the first bit, since for a larger string, the last block may not be the first block.