Executing OLAP functions with Spark SQL

2k views Asked by At

I am working with spark version 1.6. I want to execute OLAP functions include CUBE, ROLLUP, GROUPING SETS through sql queries on Spark. I understand that the cube and rollup functions are available on the dataframe api but how can I execute them through SQL queries?

Do I need to use HiveContext for this? Do I need to have Hive setup for that? How do I persist the dataframe as a table that could be accessed by the sql query executed using HiveContext?

It would be great to have a look at an example code. Thanks.

1

There are 1 answers

0
Antoine CHAMBILLE On

At ActiveViam we also needed to do interactive OLAP queries on Apache Spark. So we made an extension called Sparkube that exposes a Spark dataset as a multidimensional cube, without moving the data.

Once your dataset is exposed that way, you get access to all the OLAP MDX functions directly on top of Spark, including DRILLDOWN, DRILLUP

Here is for instance how you mount the content of a CSV file in memory and expose it as a multidimensional cube:

// Load a file into a Spark dataset.
// Here we load a CSV file, get the column names from
// the CSV headers, and use Spark automatic type inference.
var ds = spark.read
  .format("csv")
  .option("header","true")
  .option("inferSchema","true")
  .load("path/to/file.csv")

// Import the sparkube library (the sparkube jar must be in the classpath)
import com.activeviam.sparkube._

// Expose the dataset as a multidimensional cube.
// You can start visualizing the cube right away at http://localhost:9090/ui
// or connect from Excel with the XMLA protocol at http://localhost:9090/xmla
new Sparkube().fromDataset(ds)
  .withName("My first cube")
  .expose()