I have a hive table which is present on top of spark context. The format of the table is as below
| key | param1 | Param 2|
-------------------------
| A | A11 | A12 |
| B | B11 | B12 |
| A | A21 | A22 |
I wanted to create a DataFrame with schema
val dataSchema = new StructType(
Array(
StructField("key", StringType, nullable = true),
StructField("param", ArrayType(
StructType( Array(
StructField("param1", StringType, nullable = true),
StructField("param2", StringType, nullable = true)
)), containsNull = true), nullable = true)
)
)
from the above table
So that final Table become
| key | param |
-------------------------------------------------------------
| A | [{param1:A11, param2:A12},{param1:A11, param2:A12}]|
| B | [{param1:B11, param2:B12}] |
I am loading the table using hive context(hiveContext.table("table_name")), which returns data frame.
scala> val df = hiveContext.table("sample")
df: org.apache.spark.sql.DataFrame = [fk: string, param1: string, param2: string]
scala> val dfStruct = df.select($"key", struct($"param1", $"param2").alias("param"))
dfStruct: org.apache.spark.sql.DataFrame = [fk: string, sub: struct<param1:string,param2:string>]
scala> dfStruct.show
+--+----------+
|fk| param|
+--+----------+
| A| [A11,A12]|
| B| [B11,B12]|
| A| [A21,A22]|
+--+----------+
scala>
I am trying to use the dataframe to transform to the table as above using groupBy. But not able to do.
I found myself.
Key is use
case class
rather thanstructType