Explode nested list of objects into DataFrame in Spark

42 views Asked by At

I have a dataframe that looks like this

|               Column                           |
|------------------------------------------------|
|[{a: 2, b: 4}, {a: 2, b: 3}]                    |
|------------------------------------------------|
|[{a: 12, b: 14}, {a: 25, b: 33}, {a: 22, b: 31}]|
...

And I need to convert it to dataframe like

| a | b |
|---|---|
| 2 | 4 |
| 2 | 3 |
|12 |13 |
1

There are 1 answers

3
Leo C On BEST ANSWER

Simplest approach might be to use SparkSQL function inline as shown below:

case class AB(a: Int, b: Int)

val df = Seq(
    Seq(AB(2, 4), AB(2,3)),
    Seq(AB(12, 14), AB(25, 33), AB(22, 31))
  ).toDF("arrAB")

df.select(inline($"arrAB")).show
/*
+---+---+
|  a|  b|
+---+---+
|  2|  4|
|  2|  3|
| 12| 14|
| 25| 33|
| 22| 31|
+---+---+
*/

Note that while inline has been part of the SparkSQL API since 2.0, it's available as a built-in function for Dataframes only on Spark 3.4+. To use it on older Spark versions, wrap it with expr like below:

df.select(expr("inline(arrAB)"))