Is there a utility for estimating a file's size after compression?

1.7k views Asked by At

I would like to estimate the final size of a file, files, or a directory of files after it has been compressed. I'm looking for a program or script that can estimate/calculate this.

Any ideas?

Such a tool must be accessible on the command line (for Linux/Mac). It would be most useful if it would work with all or most of the commonly-used lossless compression algorithms (gz, bzip2, zip, etc.). Bonus points if it listed the compression ratios (or of equivalent use, the resulting file size) for a variety of methods. I fully expect that such a tool would scan the file prior to producing output, but I want to avoid any actual compression, if possible.

If it matters, I'd prefer that this be general-purpose:

  • It should work well for any kind of file(s), including easily-compressed ASCII text files, binary data, and everything in between. (Of course, this depends wildly on the compression algorithm/tool.)
  • It should work with a variety of compression algorithms/tools

The following BASH script does what I want for one kind of compression algorithm, but it doesn't count (I'd like an estimation):

#!/bin/bash

FILES_TO_COMPRESS=`ls ./*txt`
TEMP_FILE=myData.tgz
tar -zcvf $TEMP_FILE $FILES_TO_COMPRESS
du -h $TEMP_FILE | awk '{print $1}'
rm -f $TEMP_FILE

I would primarily use this for larger files (larger than a gigabyte), which is why I want only the estimate, and not an actual compression.

1

There are 1 answers

0
Basile Starynkevitch On

You might compress into a pipe like | wc (you could use pipe(7)-s or fifo(7)-s, perhaps with bash coprocesses) but you still need to compress.

(Unless you are very tight on disk space, I believe it is not worth the pain)

Notice that not every file is genuinely compressible.