Cant reach hbase (on S3) from pyspark

35 views Asked by At

I have created a cluster on emr and I want to use hbase with pyspark. I am new to using distributed systems so I might make amatuer mistakes but connecting to hbase from pyspark feels very hard.

My code:

from pyspark.sql import SparkSession


spark = SparkSession.builder.appName("test hbase").config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem").getOrCreate()
spark.conf.set("hbase.zookeeper.quorum", "quorum_url")
spark.conf.set("hbase.zookeeper.property.clientPort", "2181")
spark.sparkContext.setSystemProperty("hbase.rootdir", "s3://emr-test/data/")
spark.sparkContext.setSystemProperty("hbase.cluster.distributed", "true")
spark.sparkContext.setSystemProperty("hbase.regionserver.global.memstore.upperLimit", "0.5")

spark.read.format("org.apache.hadoop.hbase.spark").option("hbase.table", "example_table").load()


    
print(hbase_tables)

spark.stop()

I have made sure the table exists but logging into hbase shell from the master node and reading the table.

The error I get

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/spark/python/pyspark/sql/readwriter.py", line 314, in load
    return self._df(self._jreader.load())
  File "/usr/lib/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
  File "/usr/lib/spark/python/pyspark/errors/exceptions/captured.py", line 179, in deco
    return f(*a, **kw)
  File "/usr/lib/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o238.load.
: org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: org.apache.hadoop.hbase.spark. Please find packages at `https://spark.apache.org/third-party-projects.html`.

I am using Amazon EMR version emr-7.0.0 so the installed applications versions are Spark 3.5.0 and Hbase 2.4.17.

how i am executing: spark-submit test.py Maybe I am missing jars. If that is the case, can someone tell me how i can get the jars that are meant for my version of hbase and spark?

0

There are 0 answers