I have a .tar.gz file which may have the following files:
folder1/folder2/folder3/imp_folder1/file11.jpg
folder1/folder2/folder3/imp_folder1/file12.jpg
folder1/folder2/folder3/imp_folder2/file21.jpg
folder1/folder2/folder3/imp_folder3/file31.jpg
...
...
I want to untar it to the following directories:
/new_folder1/new_folder2/imp_folder1/file11.jpg
/new_folder1/new_folder2/imp_folder1/file12.jpg
/new_folder1/new_folder2/imp_folder2/file21.jpg
/new_folder1/new_folder2/imp_folder3/file31.jpg
...
...
Basically, "folder1/folder2/folder3/" should be replaced by "/new_folder1/new_folder2/". And, if the "imp" directories are not present, then I have to create them
Right now I have an implementation that loops through all the members in the tar and creates the folder names and then does the following
input_file = tar.extractfile (member)
with open (image_path_local, 'w') as output_file:
output_file.write(input_file.read())
input_file.close()
This process is too slow. Since there are many files(in order of 100k) what will be the fastest way to achieve this?
You need to use the --transform option for tar. This posting discussed the usage of that option for a similar problem.
Here is a demo of the option's usage:
The session output for that script is as follows: