I am new to PIG scripting. need some help on this issue.
I got two set of bag in pig and from there I want to get all the field from first bag and overwrite data of first bag if second bag has the data of same field
Column list are dynamic (columns may get added or deleted any time). in set b we may get data in another field also which are currently blank, if so, then we need to overwrite set a with data available in set b
columns - uniqueid,catagory,b,c,d,e,f,region,g,h,date,direction,indicator
EG:
all_data= COGROUP a by (uniqueid), b by (uniqueid);
Output:
(1,{(1,test,,,,,,,,city,,,,,2020-06-08T18:31:09.000Z,west,,,,,,,,,,,,,A)},{(1,,,,,,,,,,,,,,2020-09-08T19:31:09.000Z,,,,,,,,,,,,,,N)})
(2,{(2,test2,,,,,,,,dist,,,,,2020-08-02T13:06:16.000Z,east,,,,,,,,,,,,A)},{(2,,,,,,,,,,,,,,2020-09-08T18:31:09.000Z,,,,,,,,,,,,,,N)})
Expected Result:
(1,test,,,,,,,,city,,,,,2020-09-08T19:31:09.000Z,west,,,,,,,,,,,,,N)
(2,test2,,,,,,,,dist,,,,,2020-09-08T18:31:09.000Z,east,,,,,,,,,,,,N)
I was able to achieve expected output with below
final = FOREACH all_data GENERATE flatten($1),flatten($2.(region)) as region ,flatten($2.(indicator)) as indicator;