Node.js - Browserify: Error on parsing tar file

2k views Asked by At

I'm trying to download a tar file (non-compressed) over HTTP and piping it's response to the tar-stream parser for further processing. This works perfect when executed on the terminal without any errors. For the same thing to be utilized on browser, a bundle.js file is generated using browserify and is included in the HTML.

The tar stream contains 3 files. This browserified code when executed on the browser parses 2 entries successfully but raises the following error for the third one:

Error: Invalid tar header. Maybe the tar is corrupted or it needs to be gunzipped?

Whereas with the same HTTP download and parsing code, the tar file is downloaded and parsed completely without errors on terminal. Why is this happening?!

Code snippet is along these lines:

. . . .
var req = http.request(url, function(res){
  res.pipe(tar.extract())
  .on('entry', function(header, stream, callback) {
     console.log("File found " + header.name);
     stream.on('end', function() {
       console.log("<<EOF>>");
       callback();
     })
     stream.resume();
   })

  .on('finish', function(){
     console.log("All files parsed");
   })

  .on('error', function(error){
     console.log(error); //Raises the above mentioned error here
   })
});
. . . .

Any Suggestions? Headers?

1

There are 1 answers

1
Kris Reeves On BEST ANSWER

The problem here (and its solution) are tucked away in the http-browserify documentation. First, you need to understand a few things about browserify:

  • The browser environment is not the same as the node.js environment
  • Browserify does its best to provide node.js APIs that don't exist in the browser when the code you are browserifying needs them
  • The replacements don't behave exactly the same as in node.js, and are subject to caveats in the browser

With that in mind, you're using at least three node-specific APIs that have browserify reimplementations/shims: network connections, buffers, and streams. Network connections by necessity are replaced in the browser by XHR calls, which have their own semantics surrounding binary data that don't exist within Node [Node has Buffers]. If you look here, you'll notice an option called responseType; this sets the response type of the XHR call, which must be done to ensure you get binary data back instead of string data. Substack suggested to use ArrayBuffer; since this must be set on the options object of http.request, you need to use the long-form request format instead of the string-url format:

http.request({
    method: 'GET',
    hostname: 'www.site.com',
    path: '/path/to/request',
    responseType: 'arraybuffer' // note: lowercase
}, function (res) {
    // ...
});

See the xhr spec for valid values for responseType. http-browserify passes it along as-is. In Node, this key will simply be ignored.

When you set the response type to 'arraybuffer', http-browserify will emit chunks as Uint8Array. Once you're getting a Uint8Array back from http.request, another problem presents itself: the Stream API only accepts string and Buffer for input, so when you pipe the response to the tar extractor stream, you'll receive TypeError: Invalid non-string/buffer chunk. This seems to me to be an oversight in stream-browserify, which should accept Uint8Array values to go along nicely with the other parts of the browserified Node API. You can fairly simply work around it yourself, though. The Buffer shim in the browser accepts a typed array in the constructor, so you can pipe the data yourself, converting each chunk to a Buffer manually:

http.request(opts, function (res) {
    var tarExtractor = tar.extract();
    res.on('data', function (chunk) {
        tarExtractor.write(new Buffer(chunk));
    });
    res.on('end', function () {
        tarExtractor.end();
    });
    res.on('error', function (err) {
        // do something with your error
        // and clean up the tarExtractor instance if necessary
    });
});

Your code, then, should look something like this:

var req = http.request({
  method: 'GET',
  // Add your request hostname, path, etc. here
  responseType: 'arraybuffer'
}, function(res){
  var tarExtractor = tar.extract();

  res.on('data', function (chunk) {
    tarExtractor.write(new Buffer(chunk));
  });
  res.on('end', tarExtractor.end.bind(tarExtractor));
  res.on('error', function (error) {
    console.log(error);
  });

  tarExtractor.on('entry', function(header, stream, callback) {
     console.log("File found " + header.name);
     stream.on('end', function() {
       console.log("<<EOF>>");
       callback();
     })
     stream.resume(); // This won't be necessary once you do something with the data
   })

  .on('finish', function(){
     console.log("All files parsed");
   });
});