Apache PIG - ERROR org.apache.pig.impl.PigContext - Encountered " <OTHER> ",= "" at line 1, column 1

988 views Asked by At

I'm trying to make some data cleansing under my data using Apache PIG using data from a table from Hive.

I've this statement in my Apache PIG:

   INPUT_FILE = LOAD 'staging_area' USING org.apache.hive.hcatalog.pig.HCatLoader()
AS
          (ID:Long, 
          CHAIN:Int,
          DEPT:Int,
          CATEGORY:Int,
          COMPANY:Long,
          BRAND:Long,
          DATE:Chararray,
          QUARTER:Int,
          MONTH:Int,
          DAY:Int,
          WEEKDAY:Int,
          PRODUCT_SIZE:Int,
          PRODUCT_MEASURE:Chararray,
          PRODUCT_QUANTITY:Int,
          PURCHASE_AMOUNT:Double);

SPLIT INPUT_FILE INTO DATA IF (PRODUCT_SIZE > 0 AND PURCHASE_AMOUNT > 0 AND PRODUCT_QUANTITY > 0), MISSING_VALUES if (PRODUCT_QUANTITY <= 0 OR PURCHASE_AMOUNT <= 0);

DATA_TRANSFORMATION = FOREACH DATA GENERATE 
                                            ID,
                                            CHAIN,
                                            DEPT,
                                            CATEGORY,
                                            ToDate(DATE,'yyyy-MM-dd') as DATE_ID,
                                            QUARTER,
                                            MONTH,
                                            DAY,
                                            WEEKDAY,
                                            PRODUCT_SIZE,
                                            PURCHASE_AMOUNT;

GRP = GROUP DATA_TRANSFORMATION BY ID;

SUMMED = foreach GRP {
     amount = SUM(DATA_TRANSFORMATION.PURCHASE_AMOUNT);
     cnt = COUNT(DATA_TRANSFORMATION.ID);
     generate group, Purchase_Average,Freq_Visits;
}

JOINED = join DATA_TRANSFORMATION by $0, SUMMED by $0;

DATASET = FOREACH JOINED GENERATE $0,$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12;

RANKING = rank DATASET by $6,$1,$0;

DW = FOREACH RANKING GENERATE $1 as ID,$2 as Purchase_Average, $3 as Freq_Visits, $0 as Transaction_ID, $4,$5,$6,$7,$8,$9,$10,$11,$12,$13;

STORE DW INTO '/user/cloudera/data' USING PigStorage(',');

The table in Hive have this data (top 10):

id  chain   dept    category    company brand   date_id quarter month_id    day_id  weekday productsize productmeasure  purchasequantity    purchaseamount
1940424003  46  99  9909    1081843181  25935   29-01-2013 00:00    1   1   29  2   6   OZ  2   5
1940424003  46  35  3504    103500030   13470   04-02-2013 00:00    1   2   4   1   25  OZ  2   5
1940424003  46  91  9115    108048080   1230    08-02-2013 00:00    1   2   8   5   0   LT  1   13.99
1940452798  46  7   706 101200010   17286   09-02-2013 00:00    1   2   9   6   38  OZ  1   5.75
1940452798  46  45  4517    107220575   17340   10-02-2013 00:00    1   2   10  7   16  OZ  1   45
1940452798  46  99  9909    107143070   5072    10-02-2013 00:00    1   2   10  7   12  OZ  1   1.99
1940452798  46  21  2119    1061300868  867 10-02-2013 00:00    1   2   10  7   138 OZ  1   43.8
1940452798  46  56  5616    1071373373  11473   10-02-2013 00:00    1   2   10  7   8   OZ  1   2.5
1940452798  46  7   706 107146474   2142    10-02-2013 00:00    1   2   10  7   15  OZ  1   2
1940452798  46  72  7205    103700030   4294    22-02-2013 00:00    1   2   22  5   6   OZ  1   3

Everytime that I run my script I'm getting this error:

ERROR org.apache.pig.impl.PigContext - Encountered " <OTHER> ",= "" at line 1, column 1

Anyone knows how to solve this? My data have a size of 3 000 000 records and I'm using Cloudera Quickstart VM 5.8.

1

There are 1 answers

2
Alexey On BEST ANSWER
SUMMED = foreach GRP {
     amount = SUM(DATA_TRANSFORMATION.PURCHASE_AMOUNT);
     cnt = COUNT(DATA_TRANSFORMATION.ID);
     generate group, Purchase_Average,Freq_Visits;
}

You can't project Purchase_Average and Freq_Visits here.