Does gcloud storage python client API support parallel composite upload?

1.7k views Asked by At

The gsutil command has options to optimize upload/download speed for large files. For example

GSUtil:parallel_composite_upload_threshold=150M
GSUtil:sliced_object_download_max_components=8

see this page for reference.

What is the equivalence in the google.cloud.storage python API? I didn't find the relevant parameters in this document.

In general, does the client API and gsutil have one to one correspondence in terms of functionalities?

2

There are 2 answers

3
DazWilkin On BEST ANSWER

I think it's not natively supported.

However (!) if you're willing to decompose files then use threading or multiprocessing, there is a compose method that should help you assemble the parts into one GCS object.

Ironically, gsutil is written in Python but it uses a library gslib to implement parallel uploads. You may be able to use gslib as a template.

0
Chris Madden On

Check out the transfer_manager module of the Python Client for Google Cloud Storage or in this sample code. It has methods for file up/download where you pass it the object to copy, set the chunk size and number of workers, and the module takes care of the rest.

It uses the Multipart XML API which allows even higher parallelization than gcloud storage of gsutil, and avoids potential bucket feature interop issues too. I wrote about this and more in my recent blog post "High throughput file transfers with Google Cloud Storage (GCS).