Failed to copy Hadoop and Java packages to Google Cloud Storage

167 views Asked by At

I am trying to setup a Hadoop cluster on Google Compute Engine, and I have been following these instructions. Everything seems to have worked just fine until I ran:

./compute_cluster_for_hadoop.py setup <project ID> <bucket name>

with my project ID and bucket name I created. The script does not seem to have access to something and crashes with a 403; here is the tail end of the output with the error messages:

Uploading   ...kages/ca-certificates-java_20121112+nmu2_all.deb: 14.57 KB/14.57 KB    
Uploading   ...duce/tmp/deb_packages/libnspr4_4.9.2-1_amd64.deb: 316 B/316 B    
Uploading   ...e/tmp/deb_packages/libnss3-1d_3.14.3-1_amd64.deb: 318 B/318 B    
Uploading   ...dk-6-jre-headless_6b27-1.12.6-1~deb7u1_amd64.deb: 366 B/366 B    
Uploading   ...duce/tmp/deb_packages/libnss3_3.14.3-1_amd64.deb: 315 B/315 B    
ResumableUploadAbortException: 403 Forbidden
AccessDeniedException: 403 Forbidden
AccessDeniedException: 403 Forbidden
AccessDeniedException: 403 Forbidden
AccessDeniedException: 403 Forbidden
ResumableUploadAbortException: 403 Forbidden
AccessDeniedException: 403 Forbidden
CommandException: 7 files/objects could not be transferred.

########## ERROR ##########
Failed to copy Hadoop and Java packages to Cloud Storage gs://<bucket name>/mapreduce/tmp/
###########################

Traceback (most recent call last):
  File "./compute_cluster_for_hadoop.py", line 230, in <module>
    main()
  File "./compute_cluster_for_hadoop.py", line 226, in main
    ComputeClusterForHadoop().ParseArgumentsAndExecute(sys.argv[1:])
  File "./compute_cluster_for_hadoop.py", line 222, in ParseArgumentsAndExecute
    params.handler(params)
  File "./compute_cluster_for_hadoop.py", line 36, in SetUp
    gce_cluster.GceCluster(flags).EnvironmentSetUp()
  File "/Path/To/solutions-google-compute-engine-cluster-for-hadoop/gce_cluster.py", line 149, in EnvironmentSetUp
    raise EnvironmentSetUpError('Environment set up failed.')
gce_cluster.EnvironmentSetUpError: Environment set up failed.
1

There are 1 answers

0
Yaniv Donenfeld On

I recommend that you switch to using the more updated, live and maintained "bdutil" package by Google. You can find the details in the GCP Hadoop announcement forum.

If you go to the most updated announcement, you will find a link to the most updated "bdutil" package (currently 0.36.4). It will simplify your cluster deployment, and support both Hadoop and Spark clusters.

Also, I would like to recommend:

  1. Deploy the cluster from within a machine in GCE. It will make the process more speedy and reliable.

  2. In the file bdutil_env.sh, change the parameter GCUTIL_SLEEP_TIME_BETWEEN_ASYNC_CALLS_SECONDS from 0.1 to 0.5 (for me, it fixed recurring deployment errors)