How to read and write to hive tables from Apache Spark on Analytics Engine

Question

How to read and write to hive tables from Apache Spark on Analytics Engine

1k views Asked by Chris Snow At 01 June 2018 at 20:04

I would like to use Spark on analytics engine from a Jupyter notebook in Watson Studio to be able to read and write to Hive tables, but it isn't clear how I can do this by just reading the Spark documentation. The reason that it isn't clear is that IBM Analytics Engine comes pre-configured with Spark and does not provide root access.

I have found some posts online for generic hadoop that describe creating a hive.xml with the metastore location, but it isn't clear how this is translated to the IBM Analytics Engine environment. For example: Unable to write data on hive using spark

Original Q&A

There are 1 answers

**Chris Snow** · Accepted Answer · 2018-06-01T20:04:19+00:00

Here is an example using scala ...

Skip the first two steps if you are running spark directly on the Analytics Engine cluster.

Create a Watson Studio Project and associate with an Analytics Engine service.
Create a scala Notebook that uses the Analytics Engine Spark service
Enter the following scala code to retrieve the HiveContext and list the Hive databases (if any)

    import org.apache.spark.sql.hive.HiveContext
    val hc = new HiveContext(sc)

    // uncomment and adjust the next line if you are using
    // Compose mysql for the hive metastore

    /*
       hc.setConf("hive.metastore.warehouse.dir", 
         "mysql://admin:[email protected]:32023/compose");
    */

    import hc.implicits._
    val df = hc.sql("show databases")
    df.show

Create a Dataframe from static values for testing

    val test_df = Seq(
      (8, "bat"),
      (64, "mouse"),
      (-27, "horse")
    ).toDF("number", "word");

Write the dataframe

    test_df.write.mode("overwrite").saveAsTable("src");

Now verify that you are able to read the table from spark

    val read_df = hc.sql("select * from src")
    read_df.show

Open a hive session and verify you can query the table from hive

    select * from src

TechQA.

How to read and write to hive tables from Apache Spark on Analytics Engine

There are 1 answers

Related Questions in IBM-CLOUD

Related Questions in WATSON-STUDIO

Related Questions in ANALYTICS-ENGINE

Popular Questions

Popular Tags

Trending Questions