i have the following input XML:
<?xml version="1.0"?>
<Employees>
<Employee emplid="1111">
<lastname>Watson</lastname>
<age>30</age>
<email>[email protected]</email>
</Employee>
<Employee emplid="2222">
<firstname>Sherlock</firstname>
<lastname>Holmes</lastname>
<age>32</age>
<email>[email protected]</email>
</Employee>
</Employees>
Please notice the firstname missing from the employee 1111
I'm executing following select:
select
c1.emplid,
fname,
lname
from(
select emplid, xmldata from employeeXML
LATERAL VIEW explode (xpath(xmldata,'/Employees/Employee/@emplid')) dummyTable as emplid )c1
LATERAL VIEW explode (xpath(xmldata,concat('/Employees/Employee[@id="',c1.emplid,'"',']/firstname/text()')))dummyTable2 as fname
LATERAL VIEW explode (xpath(xmldata,concat('/Employees/Employee[@id="',c1.emplid,'"',']/lastname/text()'))) dummyTable3
as lname;
The expected result :
1111 NULL Watson
2222 Sherlock Holmes
Please notice that NULL value for the missing first name)
however i'm getting the following result:
2222 Sherlock Holmes
Becasue the first name is missing for the employee 1111, i'm not getting the first employee back in my query. Is there a way to get both employee data back as indicated in the expected result with first name set to NULL and/or space when it is missing ? Please help. thanks,
You can always concatenate the result with an empty string, this should probably be fine:
This is not the concatenate you used in hive, but an internal XPath function, so you will probably apply both the XPath and Hive concat in one line.
By the way, I guess you want to use
@emplid
instead of@id
to match your data?