Read Data from HBase running on EMR Cluster with Spark installed on local machine

1.7k views Asked by At

I have HBase running on EMR cluster and I'm trying to access the tables on it with Spark from local machine.

It seems that it connects to Zookeeper but can't find the table I'm looking for.

Here is my code, hbase-site.xml and the messages I get.

package org.apache.spark.examples

    import org.apache.hadoop.fs.Path
    import org.apache.hadoop.hbase.HBaseConfiguration
    import org.apache.hadoop.hbase.client.HBaseAdmin
    import org.apache.hadoop.hbase.mapreduce.TableInputFormat
    import org.apache.spark._
    
    
    
    object HBaseTestEMR {
      def main(args: Array[String]) {
        val sparkConf = new SparkConf().setAppName("HBaseTest").setMaster("local[4]")
        val sc = new SparkContext(sparkConf)
    
        val conf = HBaseConfiguration.create()
     
        val table_name="empl"
        conf.addResource(new Path("/home/spark/development/hbase/conf/hbase-site.xml"))
        conf.set(TableInputFormat.INPUT_TABLE, table_name)
        
        println("-------------1")
        val admin = new HBaseAdmin(conf)
        //println(admin.listTables())
        println("-------------2")
        if (admin.isTableAvailable(table_name))  println("la table existe")
        else println("la table n'existe pas")
        println("-------------3")
    
    
        sc.stop()
    
      }
    }

hbase-site.xml

<configuration>
  <property><name>fs.hdfs.impl</name><value>emr.hbase.fs.BlockableFileSystem</value></property>
  <property><name>hbase.regionserver.handler.count</name><value>100</value></property>
  <property><name>hbase.zookeeper.quorum</name><value>ec2-52-26-***-***.us-west-2.compute.amazonaws.com</value></property>
  <property><name>hbase.rootdir</name><value>hdfs://10.0.0.25:9000/hbase</value></property>
  <property><name>hbase.cluster.distributed</name><value>true</value></property>
  <property><name>hbase.tmp.dir</name><value>/mnt/var/lib/hbase/tmp-data</value></property>
</configuration>

and the message i get

15/06/10 12:00:28 INFO ZooKeeper: Client environment:java.io.tmpdir=/tmp
15/06/10 12:00:28 INFO ZooKeeper: Client environment:java.compiler=<NA>
15/06/10 12:00:28 INFO ZooKeeper: Client environment:os.name=Linux
15/06/10 12:00:28 INFO ZooKeeper: Client environment:os.arch=amd64
15/06/10 12:00:28 INFO ZooKeeper: Client environment:os.version=3.2.0-67-generic
15/06/10 12:00:28 INFO ZooKeeper: Client environment:user.name=spark
15/06/10 12:00:28 INFO ZooKeeper: Client environment:user.home=/home/spark
15/06/10 12:00:28 INFO ZooKeeper: Client environment:user.dir=/home/spark/projetWordCount
15/06/10 12:00:28 INFO ZooKeeper: Initiating client connection, connectString=ec2-52-26-***-***.us-west-2.compute.amazonaws.com:2181 sessionTimeout=90000 watcher=hconnection-0x7ecf3c090x0, quorum=ec2-52-26-***-***.us-west-2.compute.amazonaws.com:2181, baseZNode=/hbase
15/06/10 12:00:28 INFO ClientCnxn: Opening socket connection to server ec2-52-26-***-***.us-west-2.compute.amazonaws.com/52.26.***.***:2181. Will not attempt to authenticate using SASL (unknown error)
15/06/10 12:00:28 INFO ClientCnxn: Socket connection established to ec2-52-26-***-***.us-west-2.compute.amazonaws.com/52.26.***.***:2181, initiating session
15/06/10 12:00:28 INFO ClientCnxn: Session establishment complete on server ec2-52-26-***-***.us-west-2.compute.amazonaws.com/52.26.***.***:2181, sessionid = 0x14ddc7d70ed0023, negotiated timeout = 90000
-------------2

and then nothing happens

so, is it possible to do what i want ? and what part of my configuration is wrong ?

0

There are 0 answers