Node.js archiver Need syntax for excluding file types via glob

2.4k views Asked by At

Using archiver.js (for Node.js), I need to exclude images from a recursive (multi-subdir) archive. Here is my code:

const zip = archiver('zip', { zlib: { level: 9 } });
const output = await fs.createWriteStream(`backup/${fileName}.zip`);
res.setHeader('Content-disposition', `attachment; filename=${fileName}.zip`);
res.setHeader('Content-type', 'application/download');
output.on('close', function () {
  res.download(`backup/${fileName}.zip`, `${fileName}.zip`);
});
output.on('end', function () {
  res.download(`backup/${fileName}.zip`, `${fileName}.zip`);
});
zip.pipe(output);
zip.glob('**/*',
  {
    cwd: 'user_uploads',
    ignore: ['*.jpg', '*.png', '*.webp', '*.bmp'],
  },
  {});
zip.finalize();

The problem is that it did not exclude the ignore files. How can I correct the syntax?

2

There are 2 answers

3
Christos Lytras On BEST ANSWER

Archiver uses Readdir-Glob for globbing which uses minimatch to match.

The matching in Readdir-Glob (node-readdir-glob/index.js#L147) is done against the full filename including the pathname and it does not allow us to apply the option matchBase which will much just the basename of the full path.

In order for to make it work you have 2 options:


1. Make your glob to exclude the file extensions

You can just convert your glob expression to exclude all the file extensions you don't want to be in your archive file using the glob negation !(...) and it will include everything except what matches the negation expression:

zip.glob(
  '**/!(*.jpg|*.png|*.webp|*.bmp)',
  {
    cwd: 'user_uploads',
  },
  {}
);

2. Make minimatch to work with full file pathname

To make minimatch to work without us being able to set the matchBase option, we have to include the matching directory glob for it to work:

zip.glob(
  '**/*',
  {
    cwd: 'user_uploads',
    ignore: ['**/*.jpg', '**/*.png', '**/*.webp', '**/*.bmp'],
  },
  {}
);

Behaviour

This behaviour of Readdir-Glob is a bit confusing regarding the ignore option:

Options

ignore: Glob pattern or Array of Glob patterns to exclude matches. If a file or a folder matches at least one of the provided patterns, it's not returned. It doesn't prevent files from folder content to be returned.

This means that igrore items have to be actual glob expressions that must include the whole path/file expression. When we specify *.jpg, it will match files only in the root directory and not the subdirectories. If we want to exclude JPG files deep into the directory tree, we have to do it using the include all directories pattern in addition with the file extension pattern which is **/*.jpg.

Exclude only in subdirectories

If you want to exclude some file extensions only inside specific subdirectories, you can add the subdirectory into the path with a negation pattern like this:

// The glob pattern '**/!(Subdir)/*.jpg' will exclude all JPG files,
// that are inside any 'Subdir/' subdirectory.

zip.glob(
  '**/*',
  {
    cwd: 'user_uploads',
    ignore: ['**/!(Subdir)/*.jpg'],
  },
  {}
);
1
ralf htp On

The following code is working with this directory structure :

node-app
    |
    |_ upload
         |_subdir1
         |_subdir2
         |_...

In the code __dirname is the node-app directory (node-app is the directory where your app resides). The code is an adaptation of the code on https://www.archiverjs.com/ in paragraph Quick Start

// require modules
const fs = require('fs');
const archiver = require('archiver');

// create a file to stream archive data to.
const output = fs.createWriteStream(__dirname + '/example.zip');
const archive = archiver('zip', {
  zlib: { level: 9 } // Sets the compression level.
});

// listen for all archive data to be written
// 'close' event is fired only when a file descriptor is involved
output.on('close', function() {
  console.log(archive.pointer() + ' total bytes');
  console.log('archiver has been finalized and the output file descriptor has closed.');
});

// This event is fired when the data source is drained no matter what was the data source.
// It is not part of this library but rather from the NodeJS Stream API.
// @see: https://nodejs.org/api/stream.html#stream_event_end
output.on('end', function() {
  console.log('Data has been drained');
});

// good practice to catch warnings (ie stat failures and other non-blocking errors)
archive.on('warning', function(err) {
  if (err.code === 'ENOENT') {
    // log warning
  } else {
    // throw error
    throw err;
  }
});

// good practice to catch this error explicitly
archive.on('error', function(err) {
  throw err;
});

// pipe archive data to the file
archive.pipe(output);

    
archive.glob('**', 
             {
                cwd: __dirname + '/upload',
                ignore: ['*.png','*.jpg']}
);

// finalize the archive (ie we are done appending files but streams have to finish yet)
// 'close', 'end' or 'finish' may be fired right after calling this method so register to them beforehand
archive.finalize();

glob is an abbreviation for 'global' so you use wildcards like * in the filenames ( https://en.wikipedia.org/wiki/Glob_(programming) ). So one possible accurate wildcard expression is *.jpg, *.png,... depending on the file type you want to exclude. In general the asterisk wildcard * replaces an arbitrary number of literal characters or an empty string in in the context of file systems ( file and directory names , https://en.wikipedia.org/wiki/Wildcard_character)

See also node.js - Archiving folder using archiver generate an empty zip