Query iceberg tables on Spark thrift server with glue catalog

342 views Asked by At

I have started a spark thrift server with below configurations to query iceberg tables stored in s3. The iceberg catalog is AWS glue catalog. I am able to run spark sql using pyspark on iceberg tables with same configurations but I cannot find any help or documentation anywhere on the internet on querying data using spark thrift server. Is there any way to connect to glue catalog via beeline using jdbc to start querying.

Spark version: spark-3.3.3-bin-hadoop3

sbin/start-thriftserver.sh
--packages org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.3.1,software.amazon.awssdk:bundle:2.20.18,software.amazon.awssdk:url-connection-client:2.20.18
--conf spark.driver.extraJavaOptions="-Divy.cache.dir=/tmp -Divy.home=/tmp"
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
--conf spark.sql.defaultCatalog=iceberg_catalog
--conf spark.sql.catalog.iceberg_catalog=org.apache.iceberg.spark.SparkCatalog
--conf spark.sql.catalog.iceberg_catalog.warehouse=s3://bucket/iceberg/
--conf spark.sql.catalog.iceberg_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
--conf spark.sql.catalog.iceberg_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO

when I connect with !connect jdbc:hive2://localhost:10000 on beeline I get the below error.

Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000: Can't overwrite cause with java.lang.ClassNotFoundException: org.apache.iceberg.spark.SparkCatalog (state=08S01,code=0)

1

There are 1 answers

0
Fiza On

I fixed this by manually placing the jars in spark_home/jars folder. Seems like the --packages option with spark thrift server does not work as expected.