I have a xml structure like
<root>
<bookinfo>
<time>1232314973</time>
<requestID>233</requestID>
<supplier>asd123</supplier>
</bookinfo>
<books>
<book>
<name>book1</name>
<pages>124</pages>
</book>
<book>
<name>book2</name>
<pages>456</pages>
</book>
<book>
<name>book4</name>
<pages>789</pages>
</book>
</books>
</root>
I know that I can parse the books
like:
val xml = sqlContext.read.format("com.databricks.spark.xml")
.option("rowTag", "book").load("FILENAME")
But I would like to add the Header information like supplier
to each of the rows.
Is there a way to add this "headerinfo" to all rows with spark without loading the file twice and store the info in global vars/vals?
Thanks in advance!
You can read all xml starting from "root" tag, and then explode required tags:
Output: