I use the PHP zip:// stream wrapper to parse large XML files line by line. For example:
$stream_uri = 'zip://' . __DIR__ . '/archive.zip#foo.xml';
$reader = new XMLReader();
$reader->open( $stream_uri, null );
$reader->read();
while ( true ) {
echo( $reader->readInnerXml() . PHP_EOL );
if ( ! $reader->next() ) {
break;
}
}
Quite often an XML file will include dodgy UTF control characters XMLReader doesn't like. So I'd like to implement a custom stream wrapper I can pass the output of the zip:// stream to, which will run a preg_replace on each line to remove those characters.
My dream is to be able to do this:
stream_wrapper_register( 'xmlchars', 'XML_Chars' );
$stream_uri = 'xmlchars://zip://' . __DIR__ . '/archive.zip#foo.xml';
and have XMLReader happily read the tidied-up nodes. I've figured out a way to reconstruct the zip stream URI based on the path passed to my wrapper:
class XML_Chars {
protected $stream_uri = '';
protected $handle;
function stream_open( $path, $mode, $options, &$opened_path ) {
$parsed_url = parse_url( $path );
$this->stream_uri = 'zip:' . $parsed_url['path'] . '#' . $parsed_url['fragment'];
return true;
}
}
But I'm puzzled about the best way to open the zip:// stream so I can modify its output and pass the result through to the XMLReader. Can anyone give me any pointers about how to implement that?
In case useful to anybody else, I've found a different way to solve the problem: a stream filter. You define it like this:
And use it like this:
I'd still be interested to know if anyone's figured out how to make a stream wrapper that can accept the input of another stream wrapper though, as it could be a handy tool.