I am trying to load the following XML content using SerDe with Hive :
<?xml version="1.0"?>
<RootTag xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.website.com/service">
<Code>123</Code>
<ParentElement>
<Entity>
<EntityId>A</EntityId>
<EntityCode i:nil="true"/>
</Entity>
<Entity>
<EntityId>M</EntityId>
<EntityCode i:nil="true"/>
</Entity>
</ParentElement>
</RootTag>
The hive table was created as follows:
CREATE EXTERNAL TABLE database.mytable(
code String,
Entity array<struct<Entity:struct<EntityId:String,EntityCode:String>>>
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES
(
"column.xpath.Code" = "/RootTag/Code/text()",
"column.xpath.ParentElement" = "/RootTag/ParentElement"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/xml_content/'
TBLPROPERTIES ("xmlinput.start" = "<RootTag", "xmlinput.end" = "</RootTag>");
I have 2 issues:
- when I specify the "xmlinput.start" as specified above, it doesn't work. I had to remove manually the content "xmlns:i = ... /service" next to the "RootTage" in order to start parsing the xml.
- Even though, there is another issue with "EntityCode" attribute. I am getting the error:
Caused by: org.apache.hive.service.cli.HiveSQLException:
java.io.IOException:
org.apache.hadoop.hive.serde2.SerDeException:
java.lang.RuntimeException:
org.xml.sax.SAXParseException;
lineNumber: 41;
columnNumber: 33;
The prefix "i" for attribute "i:nil" associated with an element type "ParentCode" is not bound.
What am I doing wrong? I would appreciate your suggestions and comments regarding this.
There are few things.
i:nil="true"
column.xpath.Code
andcode
won't work. Also there is no corrosponding column forcolumn.xpath.ParentElement
So what you need to do?
<EntityCode i:nil="true"/>
to<EntityCode/>
in your xmlCode