Error connecting to google cloud and query the parquet file in apache drill

Question

Error connecting to google cloud and query the parquet file in apache drill

35 views Asked by Dheeraj P At 29 September 2023 at 13:34

I have a parquet file in google cloud bucket which I want to make a query upon.

As per one of the answer mentioned I have added the configuration in core-site.xml under $DRILL_HOME/conf as below-

<configuration>
        <property>
                <name>fs.gs.impl</name>
                <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value>
        </property>

        <property>
                <name>fs.gs.project.id</name>
                <value><my_project_id></value>
        </property>

        <property>
                <name>google.cloud.auth.service.account.enable</name>
                <value>true</value>
        </property>

        <property>
                <name>google.cloud.auth.service.account.json.keyfile</name>
                <value><path_to_json></value>
        </property>
</configuration>

Then I added this in storage-plugins-override.conf -

{
  "name": "gcs",
  "config": {
    "connection": "gs://<my_bucket>",
    "enabled": true,
    "formats": {
      "json": {
        "type": "json"
      }
    }
  }
}

After saving this, I restarted the drill. When I am running command show schemas;, the gcs schema is not showing up which is bloking me to make any query on the parquet file in GCS.

When I run use gcs;

Error is coming as

Error: SYSTEM ERROR: ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found

Then I checked the logs in sqlline.log

Caused by: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found

What am I am missing? Thanks in advance.

Tried in web UI of drill by creating storage plugin for gcs. The configuration looks like:

{
  "type": "file",
  "connection": "gs://my-bucket",
  "config": {
    "store.format": "parquet"
  },
  "formats": {
    "parquet": {
      "type": "parquet"
    }
  },
  "enabled": true
}

That is also not working.

Original Q&A

There are 1 answers

**Dzamo Norton** · Answer 1 · 2023-10-23T09:11:25+00:00

Dzamo Norton On 23 October 2023 at 09:11

You should install Google's Cloud Storage connector for Hadoop into the jars/3rdparty directory on each of your Drillbits.

TechQA.

Error connecting to google cloud and query the parquet file in apache drill

There are 1 answers

Related Questions in GOOGLE-CLOUD-STORAGE

Related Questions in PARQUET

Related Questions in APACHE-DRILL

Popular Questions

Trending Questions