My data is stored in HDFS at directory /tmp/kafka/alert in multiple files. Each file contain new-line separated JSON objects like following.
{"alertHistoryId":123456,"entityId":123,"deviceId":"123","alertTypeId":1,"AlertStartDate":"Dec 28, 2016 12:05:48 PM"}
{"alertHistoryId":123456,"entityId":125,"deviceId":"125","alertTypeId":5,"AlertStartDate":"Dec 28, 2016 11:58:48 AM"}
I added hive JSON SerDe jar using below
ADD JAR /usr/local/downloads/hive-serdes-1.0-SNAPSHOT.jar;
I created table with following
CREATE EXTERNAL TABLE IF NOT EXISTS my_alert (
alertHistoryId bigint, entityId bigint, deviceId string, alertTypeId int, AlertStartDate string
)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/tmp/kafka/alert';
table created successfully. But when I fetched data, I got all null values. Anyone got any idea how to resolve this?
You are using old version of JSON Serde. There might be an issue with your JSON Serde and Hadoop Distribution. Please find below link to get new version of Json Serde. Follow the steps from the link to build it according to your Hadoop distribution.
https://github.com/rcongiu/Hive-JSON-Serde
Please see below working example.
How to build jar.
Maven should be installed on your PC then run command like this.
C:\Users\User1\Downloads\Hive-JSON-Serde-develop\Hive-JSON-Serde-develop>mvn -Phdp23 clean package
In my case I am using hdp2.3 so I have provided
-Phdp23
Hope it will help if you are willing to use Hive JSON Serde.