I want to create a script for unzip (.tar.gz) file via (Python)

163.1k views Asked by At

I am trying to make a script for unzipping all the .tar.gz files from folders in one directory. For example, I will have a file which it calls ( testing.tar.gz). Then if I do manually, I can press to "extract here" then the .tar.gz file will create a new file, and it calls testing.tar. Finally, if I repeat the process of pressing "extract here", the .tar file prodcudes me all the .pdf files.

I wonder that how can I do it, and I have my code here and it seems doesn't realty work tho.

import os
import tarfile
import zipfile

def extract_file(path, to_directory='.'):
    if path.endswith('.zip'):
        opener, mode = zipfile.ZipFile, 'r'
    elif path.endswith('.tar.gz') or path.endswith('.tgz'):
        opener, mode = tarfile.open, 'r:gz'
    elif path.endswith('.tar.bz2') or path.endswith('.tbz'):
        opener, mode = tarfile.open, 'r:bz2'
    else: 
        raise ValueError, "Could not extract `%s` as no appropriate extractor is found" % path

    cwd = os.getcwd()
    os.chdir(to_directory)

    try:
        file = opener(path, mode)
        try: file.extractall()
        finally: file.close()
    finally:
        os.chdir(cwd)
7

There are 7 answers

10
Lye Heng Foo On

Why do you want to "press" twice to extract a .tar.gz, when you can easily do it once? Here is a simple code to extract both .tar and .tar.gz in one go:

import tarfile

if fname.endswith("tar.gz"):
    tar = tarfile.open(fname, "r:gz")
    tar.extractall()
    tar.close()
elif fname.endswith("tar"):
    tar = tarfile.open(fname, "r:")
    tar.extractall()
    tar.close()
0
Hafizur Rahman On

The following worked for me for a .tar.gz file. It will extract files in your specified destination:

import tarfile

from os import mkdir
from os.path import isdir

src_path = 'path/to/my/source_file.tar.gz'
dst_path = 'path/to/my/destination'

# create destination dir if it does not exist
if isdir(dst_path) == False:
    mkdir(dst_path)

if src_path.endswith('tar.gz'):
    tar = tarfile.open(src_path, 'r:gz')
    tar.extractall(dst_path)
    tar.close()
0
arunppsg On

If you are using python in jupyter-notebook and in a linux machine, the below will do:

!tar -xvzf /path/to/file.tar.gz -C /path/to/save_directory

! enables the command to be run in the terminal.

0
Taras Vaskiv On

Using context manager:

import tarfile
<another code>
with tarfile.open(os.path.join(os.environ['BACKUP_DIR'],
                  f'Backup_{self.batch_id}.tar.gz'), "r:gz") as so:
    so.extractall(path=os.environ['BACKUP_DIR'])
0
Ehsan On

You can execute a shell script from Python using envoy:

import envoy # pip install envoy

if (file.endswith("tar.gz")):
    envoy.run("tar xzf %s -C %s" % (file, to_directory))

elif (file.endswith("tar")):
    envoy.run("tar xf %s -C %s" % (file, to_directory))
0
Beckett O'Brien On

When I ran your program, it worked perfectly for a tar.gz and a .tgz file, it didn't give me the correct items when I opened the zip, but .tbz was the only one that raised an error. I think you used the wrong method to unpack a .tbz because the error said I had an incorrect file type, but I didn't. One way you could solve the .zip issue is to us os.command() and unzip it with a command line (depending on your os) because it returned a _MACOSX folder with nothing inside of it even though I entered the path correctly. The only other error I encountered was that you used improper syntax for raising an error.
This is what you should have used:

raise ValueError("Error message here")

You used a comma and no parenthesis. Hope this helps!

4
mickours On

If you are using python 3, you should use shutil.unpack_archive that works for most of the common archive format.

shutil.unpack_archive(filename[, extract_dir[, format]])

Unpack an archive. filename is the full path of the archive. extract_dir is the name of the target directory where the archive is unpacked. If not provided, the current working directory is used.

For example:

def extract_all(archives, extract_path):
    for filename in archives:
        shutil.unpack_archive(filename, extract_path)