I'm trying to download a tar
file (non-compressed) over HTTP and piping it's response to the tar-stream parser for further processing. This works perfect when executed on the terminal without any errors. For the same thing to be utilized on browser, a bundle.js
file is generated using browserify
and is included in the HTML.
The tar stream contains 3 files. This browserified code when executed on the browser parses 2 entries successfully but raises the following error for the third one:
Error: Invalid tar header. Maybe the tar is corrupted or it needs to be gunzipped?
Whereas with the same HTTP download and parsing code, the tar file is downloaded and parsed completely without errors on terminal. Why is this happening?!
Code snippet is along these lines:
. . . .
var req = http.request(url, function(res){
res.pipe(tar.extract())
.on('entry', function(header, stream, callback) {
console.log("File found " + header.name);
stream.on('end', function() {
console.log("<<EOF>>");
callback();
})
stream.resume();
})
.on('finish', function(){
console.log("All files parsed");
})
.on('error', function(error){
console.log(error); //Raises the above mentioned error here
})
});
. . . .
Any Suggestions? Headers?
The problem here (and its solution) are tucked away in the http-browserify documentation. First, you need to understand a few things about browserify:
With that in mind, you're using at least three node-specific APIs that have browserify reimplementations/shims: network connections, buffers, and streams. Network connections by necessity are replaced in the browser by XHR calls, which have their own semantics surrounding binary data that don't exist within Node [Node has Buffers]. If you look here, you'll notice an option called
responseType
; this sets the response type of the XHR call, which must be done to ensure you get binary data back instead of string data. Substack suggested to useArrayBuffer
; since this must be set on theoptions
object ofhttp.request
, you need to use the long-form request format instead of the string-url format:See the xhr spec for valid values for responseType. http-browserify passes it along as-is. In Node, this key will simply be ignored.
When you set the response type to 'arraybuffer', http-browserify will emit chunks as
Uint8Array
. Once you're getting aUint8Array
back fromhttp.request
, another problem presents itself: theStream
API only acceptsstring
andBuffer
for input, so when you pipe the response to the tar extractor stream, you'll receiveTypeError: Invalid non-string/buffer chunk
. This seems to me to be an oversight instream-browserify
, which should accept Uint8Array values to go along nicely with the other parts of the browserified Node API. You can fairly simply work around it yourself, though. The Buffer shim in the browser accepts a typed array in the constructor, so you can pipe the data yourself, converting each chunk to aBuffer
manually:Your code, then, should look something like this: