How to create a table in Hive with a column of data type array<map<string, string>>

44.3k views Asked by At

I am trying to create a table which has a complex data type. And the data types are listed below.

  1. array

  2. map

  3. array< map < String,String> >

I am trying to create a data structure of 3 type . Is it ever possible to create in Hive? My table DDL looks like below.

create table complexTest(names array<String>,infoMap map<String,String>, deatils array<map<String,String>>)           
row format delimited                                                                                       
fields terminated by '/'                                                                                   
collection items terminated by '|'                                                                         
map keys terminated by '='                                                                                 
lines terminated by '\n';

And my sample data looks like below.

Abhieet|Test|Complex/Name=abhi|age=31|Sex=male/Name=Test,age=30,Sex=male|Name=Complex,age=30,Sex=female

Whever i am querying the data from the table i am getting the below values

["Abhieet"," Test"," Complex"]  {"Name":"abhi","age":"31","Sex":"male"} [{"Name":null,"Test,age":null,"31,Sex":null,"male":null},{"Name":null,"Complex,age":null,"30,Sex":null,"female":null}]

Which is not i am expecting. Could you please help me to find out what should be the DDL if it ever possible for data type array< map < String,String>>

4

There are 4 answers

0
maxymoo On

I don't think this is possible using the inbuilt serde. If you know in advance what the values in your maps are going to be, then I think a better way of approaching this would be to convert your input data to JSON, and then use the Hive json serde:

Sample data:

{'Name': ['Abhieet', 'Test', 'Complex'],
'infoMap': {'Sex': 'male', 'Name': 'abhi', 'age': '31'},
 'details': [{'Sex': 'male', 'Name': 'Test', 'age': '30'}, {'Sex': 'female', 'Name': 'Complex', 'age': '30'}]
 }

Table definition code:

create table complexTest
(
names array<string>,
infomap struct<Name:string,
               age:string,
               Sex:string>,
details array<struct<Name:string,
               age:string,
               Sex:string>>
)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
0
Tutu Kumari On

Sample data: {"Name": ["Abhieet", "Test", "Complex"],"infoMap": {"Sex":"male", "Name":"abhi", "age":31},"details": [{"Sex":"male", "Name":"Test", "age":30}, {"Sex":"female", "Name":"Complex", "age":30}]}

Table definition code:

#hive>
create table complexTest
(names array<string>,infomap struct<Name:string,
               age:string,
               Sex:string>,details array<struct<Name:string,
               age:string,
               Sex:string>>)
row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'

0
Saanidhya Sharma On

Hive table can be created with parquet as file format.

Sample schema:

create table complexTest(
names array<String>,
infoMap struct<sex:string, name:string, age:string>, 
deatils array<struct<sex:string, name:string, age:string>>
)
stored as parquet
0
Madhavan On

This can be handled with array of structs using the following query.

create table complexStructArray(custID String,nameValuePairs array<struct< key:String, value:String>>) row format delimited fields terminated by '/' collection items terminated by '|' map keys terminated by '=' lines terminated by '\n';

Sample data:

101/Name=Madhavan|age=30

102/Name=Ramkumar|age=31

Though struct allows duplicate key values unlike Map, above query should handle the ask if the data is having unique key values.

select query would give the output as follows.

hive> select * from complexStructArray;

101 [{"key":"Name","value":"Madhavan"},{"key":"age","value":"30"}]

102 [{"key":"Name","value":"Ramkumar"},{"key":"age","value":"31"}]