Mass rename objects on Google Cloud Storage

10.3k views Asked by At

Is it possible to mass rename objects on Google Cloud Storage using gsutil (or some other tool)? I am trying to figure out a way to rename a bunch of images from *.JPG to *.jpg.

4

There are 4 answers

4
HayatoY On

https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames

gsutil supports URI wildcards

EDIT

gsutil 3.0 release note

As part of the bucket sub-directory support we changed the * wildcard to match only up to directory boundaries, and introduced the new ** wildcard...

Do you have directories under bucket? if so, maybe you need to go down to each directories or use **.

gsutil -m mv gs://my_bucket/**.JPG gs://my_bucket/**.jpg

or

gsutil -m mv gs://my_bucket/mydir/*.JPG gs://my_bucket/mydir/*.jpg

EDIT
gsutil doesn't support wildcard for destination so far (as of 4/12/'14)
nether API.

so at this moment you need to retrieve list of all JPG files, and rename each files.

python example:

import subprocess
files = subprocess.check_output("gsutil ls gs://my_bucket/*.JPG",shell=True)
files = files.split("\n")[:-1]
for f in files:
    subprocess.call("gsutil mv %s %s"%(f,f[:-3]+"jpg"),shell=True)

please note that this would take hours.

8
Andrei Volgin On
1
Iñigo González On

gsutil does not support parallelized and mass-copy/rename.

You have two options:

  • use a dataflow process to do the operation or
  • use GNU parallel to launch it using several processes

If you use GNU Parallel, it is better to deploy a new instance to do the mass copy/rename operation:

  • First: - Make a list of files you want to copy/rename (a file with source and destination separated by a space or tab), like this:
gs://origin_bucket/path/file gs://dest_bucket/new_path/new_filename
  • Second: Launch a new compute instance
  • Third: Login in that instance and install Gnu parallel
sudo apt install parallel
  • Third: authorize yourself with google (gcloud auth login) because the service account for compute might not have permissions to move/rename the files.
gcloud auth login
  • Make the copy (gsutil cp) or move (gsutil mv) operation with parallel:
   parallel -j 20 --colsep ' ' gsutil mv {1} {2} :::: file_with_source_destination_uris.txt

This will make 20 parallel runs of the gsutil cp operation.

1
beetlejuice On

Here is a native way to do this in bash with an explanation below, line by line of the code:

gsutil ls gs://bucket_name/*.JPG > src-rename-list.txt
sed 's/\.JPG/\.jpg/g' src-rename-list.txt > dest-rename-list.txt
paste -d ' ' src-rename-list.txt dest-rename-list.txt | sed -e 's/^/gsutil\ mv\ /' | while read line; do bash -c "$line"; done
rm src-rename-list.txt; rm dest-rename-list.txt

The solution pushes 2 lists, one for the source and one for the destination file (to be used in the "gsutil mv" command):

gsutil ls gs://bucket_name/*.JPG > src-rename-list.txt
sed 's/\.JPG/\.jpg/g' src-rename-list.txt > dest-rename-list.txt

The line "gsutil mv " and the two files are concatenated line by line using the below code:

paste -d ' ' src-rename-list.txt dest-rename-list.txt | sed -e 's/^/gsutil\ mv\ /'

This then runs each line in a while loop: while read line; do bash -c "$line"; done

Lastly, clean up and delete the files created:

rm src-rename-list.txt; rm dest-rename-list.txt

The above has been tested against a working Google Storage bucket.