I would like to estimate the final size of a file, files, or a directory of files after it has been compressed. I'm looking for a program or script that can estimate/calculate this.
Any ideas?
Such a tool must be accessible on the command line (for Linux/Mac). It would be most useful if it would work with all or most of the commonly-used lossless compression algorithms (gz
, bzip2
, zip
, etc.). Bonus points if it listed the compression ratios (or of equivalent use, the resulting file size) for a variety of methods. I fully expect that such a tool would scan the file prior to producing output, but I want to avoid any actual compression, if possible.
If it matters, I'd prefer that this be general-purpose:
- It should work well for any kind of file(s), including easily-compressed ASCII text files, binary data, and everything in between. (Of course, this depends wildly on the compression algorithm/tool.)
- It should work with a variety of compression algorithms/tools
The following BASH script does what I want for one kind of compression algorithm, but it doesn't count (I'd like an estimation):
#!/bin/bash
FILES_TO_COMPRESS=`ls ./*txt`
TEMP_FILE=myData.tgz
tar -zcvf $TEMP_FILE $FILES_TO_COMPRESS
du -h $TEMP_FILE | awk '{print $1}'
rm -f $TEMP_FILE
I would primarily use this for larger files (larger than a gigabyte), which is why I want only the estimate, and not an actual compression.
You might compress into a pipe like
| wc
(you could use pipe(7)-s or fifo(7)-s, perhaps with bash coprocesses) but you still need to compress.(Unless you are very tight on disk space, I believe it is not worth the pain)
Notice that not every file is genuinely compressible.