Gsutil - Issue with downloading data from GCS buckets

564 views Asked by At

I have some data in GCS buckets. I have an application that runs the gsutil command to download this data from the buckets and then does some further processing.

Before

  • I have gsutil 4.7 installed and had my .boto file successfully configured with the oauth refresh token, proxy host name, proxy port and project id.
  • My application worked fine and data was being downloaded by calling the following command via a Python subprocess module.

    "gsutil -m cp -r gcs_path destination_path"

As I pointed out above, this command was run within a python application.

Now

  • Suddenly, I noticed that the files that were being downloaded were of lesser size than what was observed in the GCS buckets. For example, if I have a file with 49MB size, it was being downloaded partially/incompletely and its size was 0-20KB and so on.
  • This was happening when gsutil was being called from within the python application.
  • Moreever, the stdout when the application is executing the gsutil command shows that 49MB out of 49MB was downloaded. But when you go the destination directory and check the file sizes, they are way off.

However, when I try the same gsutil command outside the python application i.e. just give it as the following linux command in my terminal, the data IS downloaded fully. Complete 49MB out of 49MB is being downloaded.

$gsutil -m cp -r gcs_path destination_path

I am not running out of any disk space. I upgraded gsutil from 4.7 to 4.13 but that did not help.

Is there something that I may be missing here? Thanks in advance!

0

There are 0 answers