I am working with spark version 1.6. I want to execute OLAP functions include CUBE, ROLLUP, GROUPING SETS through sql queries on Spark. I understand that the cube and rollup functions are available on the dataframe api but how can I execute them through SQL queries?
Do I need to use HiveContext for this? Do I need to have Hive setup for that? How do I persist the dataframe as a table that could be accessed by the sql query executed using HiveContext?
It would be great to have a look at an example code. Thanks.
At ActiveViam we also needed to do interactive OLAP queries on Apache Spark. So we made an extension called Sparkube that exposes a Spark dataset as a multidimensional cube, without moving the data.
Once your dataset is exposed that way, you get access to all the OLAP MDX functions directly on top of Spark, including DRILLDOWN, DRILLUP
Here is for instance how you mount the content of a CSV file in memory and expose it as a multidimensional cube: