Goal
We standing up a low volume site, where users (browser client) will select image files (284 KB per file) and then request a Node Express Server to bundle them into a ZIP for download to the web client.
Issues & Design Constraints
- The resultant ZIP might be on the order of 50 MB - 5 GB. Therefore we would like to give the user a running progress bar while the ZIP is being constructed. (We assume the browser will give running updates as to the progress of the actual download).
- While we expect low volume of requests (1-2 request at a time). However, we do not want to completely tie up our 4 core server processor, so we want to minimize synchronous calls that tie up the express server.
- Given the size of the ZIP, we cannot expect the zip to be assembled only in memory
- Is there any other issues we should worry about?
Question
We assume that running 7zip as a child process is bad, since we would not get any running status as to how many of the 258KB files had been added to the ZIP.
So which of the following packages are very Node/ExpressJS friendly packages given the design constraints/goals listed above?
- archiver: https://www.npmjs.com/package/archiver
- jszip: https://www.npmjs.com/package/jszip
- easyzip: https://www.npmjs.com/package/easy-zip
- expresszip: https://www.npmjs.com/package/express-zip
- zipstream: https://www.npmjs.com/package/zip-stream
What I am seeing above is that most packages first collect the files, and then finalize them to memory and then pipe them to the http request (probably not good for 5GB of data or am I missing something). Some seem to be able to use disk, but the question will be does one get update events as each file is added?
Others seem to be fully async and I don't see how you would get a running progress value as each file added to the ZIP package.
 
                        
Of the packages listed above. Most were not appropriate
We chose Archiver, since it had most of the features desired:
As for the 7zip solution. We tend not to like reading the entrails of a standard output stream from a spawned child process.