I have deployed a 6-node Hadoop Cluster in Google Compute Engine.
I am using Google file system(GFS) instead of Hadoop File Distribution System(HFS).
.
So, I want to access files in GFS in the same way as distributed cache method does in HDFS
Please tell me a way to access files this way.
When running Hadoop on Google Compute Engine with the Google Cloud Storage connector for Hadoop as the "default filesystem", the GCS connector is able to be treated exactly the same way HDFS is treated, including for usage in the DistributedCache. So, to access files in Google Cloud Storage, you'd use it exactly the same way you would use HDFS, no need to change anything. For example, if you had deployed your cluster with your GCS connector's
CONFIGBUCKET
set tofoo-bucket
, and you had local files you wanted to place in the DistributedCache, you'd do:And in your Hadoop job:
If you want to access files in a different bucket than your
CONFIGBUCKET
, you just need to specify a full path, usinggs://
instead ofhdfs://
:and then in Java