PHP: create gz stream from plain file

65 views Asked by At

I want to create a read stream that previously is gzencoded from a plain text.

The google cloud storage library has an upload function and you can pass a StreamInterface as parameter (Bucket::upload reference)

I want to upload a .txt file but gzencoded.

To upload a txt file is simple:

/** @var \Google\Cloud\Storage\Bucket $bucket */
$fd = fopen('/tmp/file.txt', 'r');
$stream = \GuzzleHttp\Psr7\Utils::streamFor($fd);
$bucket->upload($stream, ['name' => 'file.txt']);

I want to create a stream that:

  • reads the original plain txt file
  • does a gzencode in every chuck

And not storing the full file in memory (just the chunks) neither in disk. Is this possible?

I think it should be something like the following code, but creating a gz file (instead of zliz.deflating the data):

$fd = fopen('/tmp/file.txt', 'r');
stream_filter_append($fd, 'zlib.deflate', STREAM_FILTER_READ, ['window' => 15]);
$stream = Psr7\Utils::streamFor($fd);
$bucket->upload($stream, ['name' => 'file.txt.gz']);

Thanks!

1

There are 1 answers

4
Sammitch On BEST ANSWER

I got a bit nerd-sniped and had to write something for this.

Reposting my above comment:

While DEFLATE is the algorithm used by gzip, it is not the format. This is laid out in the response to bugs.php.net/bug.php?id=68556. This stream filter appears to use the DEFLATE format header and trailer, and there does not currently seem to be a built-in gzip stream filter.

Well we can shim in a call to the system's gzip binary with proc_open() and stream the data through that to create a properly-formatted gzip stream.

class GzipCommandFilter extends php_user_filter {

    public $stream;
    private $ph, $pipes;

    public function onCreate(): bool {

        $this->ph = proc_open(
            [ 'gzip', '-c', '-'],
            [
                ['pipe', 'r'],
                ['pipe', 'w'],
                ['pipe', 'w']
            ],
            $this->pipes
        );

        if( $this->ph === false ) {
            return false;
        }

        stream_set_blocking($this->pipes[1], false);
        stream_set_blocking($this->pipes[2], false);

        return true;
    }

    public function filter($in, $out, &$consumed, $closing): int {
        $written = 0;

        while ($bucket = stream_bucket_make_writeable($in)) {
            fwrite($this->pipes[0], $bucket->data);
            $consumed += $bucket->datalen;

            $out_buf = stream_get_contents($this->pipes[1]);
            $written += strlen($out_buf);
            $bucket->data = $out_buf;
            stream_bucket_append($out, $bucket);
        }

        if( $closing ) {
            fclose($this->pipes[0]); // closing stdin to signal completion
            $this->waitOnProc(); // let gzip process the remaining buffer
            stream_bucket_append($out, stream_bucket_new($this->stream, stream_get_contents($this->pipes[1])));
            return PSFS_PASS_ON;
        } else if( $written > 0 ) {
            return PSFS_PASS_ON;
        } else {
            return PSFS_FEED_ME;
        }
    }

    protected function waitOnProc($step=1000, $max=1000000) {
        $waited = 0;
        while( ($status = proc_get_status($this->ph))['running'] === true ) {
            usleep($step);
            $waited += $step;
            if( $waited >= $max ) {
                throw new \Exception('Timed out while waiting.');
            }
        }
    }
}

stream_filter_register('gzip', 'GzipCommandFilter');

and we would use it like:

$fh = fopen('/tmp/file.txt', 'rb');
stream_filter_append($fh, 'gzip');
$data = stream_get_contents($fh);
printf("Data: %s\nDecoded: %s\n", bin2hex($data), gzdecode($data));

Which might output something like:

Data: 1f8b0800000000000003cb48cdc9c95728cf2fca495104006dc2b4030c000000
Decoded: hello world!