How to check for missing Key in JSON using Pig?

179 views Asked by At

I have a JSON file with varying schema.

{"asin":"xxxxxx", "title":"xxxsomething"}
{"asin":"yyyyy"}
{"asin":"zzzzzz", "title":"zzzsomething"}

For which I have written a pig script that makes use of twitter's elephant-bird library to load the JSON data and convert it into a tab separated file.

However if a line in the input JSON file is missing the "title" key (line# 2 in above example), the tvs file also has nothing in place of it, like:

xxxxxx  xxxsomething
yyyyyy  
zzzzzz  zzzsomething

I would like to give custom default value if a particular key is missing. How can I do this using PigLatin?

expected output:

xxxxxx  xxxsomething
yyyyyy  default_string
zzzzzz  zzzsomething

Here's my script:

REGISTER elephant-bird-elephant-bird-4.13/pig/target/elephant-bird-pig-4.13.jar;
REGISTER elephant-bird-elephant-bird-4.13/hadoop-compat/target/elephant-bird-hadoop-compat-4.13.jar;
REGISTER elephant-bird-elephant-bird-4.13/core/target/elephant-bird-core-4.13-thrift9.jar;

reviews = load '../data/Amazon/meta_Amazon_Instant_Video.json'
          using com.twitter.elephantbird.pig.load.JsonLoader();

tabs = FOREACH reviews generate (chararray)$0#'asin' as asin_new, (chararray)$0#'title';

A = ORDER tabs BY asin_new;
DESCRIBE A;

STORE A INTO 'hdfs://localhost:9000/meta_Amazon_Instant_Video.tsv';
1

There are 1 answers

1
hello_abhishek On

You can simply write a UDF for that and put the condition that if either one of them is empty then pass the default string.