How do I grep GZ files to extract PNG files?

990 views Asked by At

Ok, so I have a tone of .GZ files in a folder, and I'm looking to recursively look through each one of them and extract all the PNG files into another destination folder. How would I do that?

EDIT:

I've been using this command from the terminal to find a string in a GZ file and copy the entire file to another destination directory. Then do stuff with it. There's a few drawbacks. One, when I put in "PNG", it finds files such as CSS files that reference "PNG" instead of file types. Second, it doesn't output anything to the directory except for copying the entire file.. I'd like to extract the file instead.

find . -type f -print0 | xargs -0 grep -lh "png" | xargs -I % cp % /some_destination

EDIT:

Here's an example folder structure:

FILE001.GZ, FILE002.GZ, FILE003.GZ, etc

Not all of them contain PNGs, and some of them contain many files in a folder structure. What I want, is the following in another destination folder:

34950560.png, 3959560.png, etc.

Thank you ahead of time!

2

There are 2 answers

2
henfiber On BEST ANSWER

Assuming that your ".GZ" files are actually gzipped ".tar" archives with multiple files, then you can accomplish your goal in one line:

find . -type f -iname '*.GZ' | xargs -n1 -I'{}' tar -C "/path/to/extract" -xf '{}' '*.png' 2>/dev/null

Explanation:

  • find . -type f -iname '*.GZ' : find all .GZ files in the current path (incl. subdirectories). -iname means case-insensitive, matching both .gz and .GZ files
  • xargs -n1 -I'{}' <command> '{}' : call 'command' with at most one argument (-n1) from stdin, placing the argument in the placeholder {}.
  • tar -C "/path/to/extract" -xf '{}' '*.png' : Extract from the file got from xargs (-xf {}), only files ending in '*.png'. -C /path/to/extract: extract files there.
  • 2>/dev/null : Mute the error messages raised from GZ files not-containing .png files.

This command will extract all .png files in the specified folder (preserving any directory structures in the original tar.gz files). Identically named .png files across multiple archives will be stored only once, i.e. the last extracted .png file will overwrite the previous identically-named file. If you want to overcome this issue, then you'll need a more complex script like:

#!/usr/bin/bash

function extract_png() {
    local gzpath=$1; local extract_path=$2
    cd "$gzpath" || return 2
    find . -iname '*.GZ' | 
        while read gzfile; do
            if tar -tf "$gzfile" '*.png' 2>/dev/null; then
                local basename=${gzfile%.*}; basename=${basename##*/}
                local extract_to="$extract_path/$basename"
                mkdir -p "${extract_to}"
                tar -C "$extract_to" -xf "$gzfile" '*.png'
            fi
        done
}

extract_png '/path/to/search' '/path/to/save'

The extract_png function will save extracted .png files to a different subfolder for each archive, under /path/to/save (e.g. /path/to/save/FILE001/, /path/to/save/FILE002/ etc).

An explanation about if tar -tf "$gzfile" '*.png' 2>/dev/null; then ... : This will return true if there are .png files in the file "$gzfile". The -t argument in tar means "list contents". When the specified files (*.png) are not included in the archive, tar -t prints an error message (hidden by 2>/dev/null) and returns a non-zero code which evaluates this condition to false.

4
Adam D On

You can use file signatures (aka magic numbers). The first few bytes of a PNG file include a file signature to indicate that the file is a PNG. If the files are all gzip'd then there's an extra header from gzip, which we can skip.
od is a command that will dump parts of a file in a readable format you specify. I tell it to skip the gzip header and dump in a hex format. From my tests, you'll end up with a string "34e6 5580" for the next eight bytes. If it matches the PNG signature, move it to the new directory and rename.

COUNTER=0; mkdir PNGDIR
#
for FILE in `ls -1d *`; do  
   od -j 4 -N 10 -x ${FILE} | grep -q "34e6 5580" 
   if [ $? -eq 0 ]; then
     COUNTER=`expr 1 + $COUNTER`
     cp ${FILE} PNGDIR/picture_${COUNTER}.png.gz
   fi
done