Error connecting to google cloud and query the parquet file in apache drill

35 views Asked by At

I have a parquet file in google cloud bucket which I want to make a query upon.

As per one of the answer mentioned I have added the configuration in core-site.xml under $DRILL_HOME/conf as below-

<configuration>
        <property>
                <name>fs.gs.impl</name>
                <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value>
        </property>

        <property>
                <name>fs.gs.project.id</name>
                <value><my_project_id></value>
        </property>

        <property>
                <name>google.cloud.auth.service.account.enable</name>
                <value>true</value>
        </property>

        <property>
                <name>google.cloud.auth.service.account.json.keyfile</name>
                <value><path_to_json></value>
        </property>
</configuration>

Then I added this in storage-plugins-override.conf -

{
  "name": "gcs",
  "config": {
    "connection": "gs://<my_bucket>",
    "enabled": true,
    "formats": {
      "json": {
        "type": "json"
      }
    }
  }
}

After saving this, I restarted the drill. When I am running command show schemas;, the gcs schema is not showing up which is bloking me to make any query on the parquet file in GCS.

When I run use gcs;

Error is coming as

Error: SYSTEM ERROR: ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found

Then I checked the logs in sqlline.log

Caused by: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found

What am I am missing? Thanks in advance.

Tried in web UI of drill by creating storage plugin for gcs. The configuration looks like:

{
  "type": "file",
  "connection": "gs://my-bucket",
  "config": {
    "store.format": "parquet"
  },
  "formats": {
    "parquet": {
      "type": "parquet"
    }
  },
  "enabled": true
}

That is also not working.

1

There are 1 answers

0
Dzamo Norton On

You should install Google's Cloud Storage connector for Hadoop into the jars/3rdparty directory on each of your Drillbits.