To read orc file from GCS bucket

108 views Asked by At

To read orc file from a GCS bucket i'm using below code snippet, where i'm creating hadoop configuration and setting required file system attributes to use gcs bucket

      val hadoopConf = new Configuration()
      hadoopConf.set("fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")
      hadoopConf.set("fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")
      hadoopConf.set("fs.defaultFS", "gs://BUCKET_NAME")
      hadoopConf.set("fs.gs.auth.service.account.enable", "true")
      hadoopConf.set("fs.gs.auth.service.account.json.keyfile", System.getenv("GOOGLE_APPLICATION_CREDENTIALS"))
      val filePath = "path/to/file.orc"
      val reader = OrcFile.createReader(new Path(filePath), OrcFile.readerOptions(hadoopConf))

gs://BUCKET_NAME/path/to/file.orc is present.

But when running the same, its getting stuck and the last log is

WARN  [.h.u.NativeCodeLoader] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

What am I missing?

Dependencies

    "org.apache.hadoop"       % "hadoop-common"           % "3.2.1",
    "org.apache.hadoop"       % "hadoop-hdfs"             % "3.2.1",
    "org.apache.hadoop"       % "hadoop-hdfs-client"      % "3.2.1",
    "com.google.cloud.bigdataoss" % "gcs-connector" % "hadoop2-2.2.0",
0

There are 0 answers