How can segregate data groups using apache PIG

80 views Asked by At

I have data in csv format with columns "movie name", price My output should be Under

5         : 5200
5-10      : 500
10-15     : 5140

and so on

I tried below code

A = LOAD '/root/pig-0.13.0/scripts/dvd_data/dvd_csv.txt' using PigStorage(',');
B = foeach A generate REPLACE($0, '\\"', ''),$2,$6 

I am unable to identify the logic to get the desired output.I am looking some help to get it.

1

There are 1 answers

0
Murali Rao On

If the use case is to get the count of movies under a fixed set of price buckets (lt5, gt5 to lt10, gt10 to lt15) etc.. then we can make use of bincond operator.

Pig Script :

A = LOAD 'a.csv' USING PigStorage(',') AS (movie_name:chararray,price:long);
B = FOREACH A GENERATE ((price < 5) ? '5' : ((price < 10) ? '5-10' : ((price < 15) ? '10-15' : '>15'))) AS key, price;
C = GROUP B BY key;
D = FOREACH C GENERATE group, COUNT(B);
DUMP D;

Sample Input : a.csv :

Movie1,1
Movie2,2 
Movie3,3
Movie4,4
Movie5,5
Movie7,7
Movie9,9
Movie10,10
Movie11,11
Movie12,12

Output : DUMP D :

(5,4)
(5-10,3)
(10-15,3)