Pig latin join by field

226 views Asked by At

I have a Pig latin related problem:

I have this data below (in one row):

A = LOAD 'records' AS (f1:chararray, f2:chararray,f3:chararray, f4:chararray,f5:chararray, f6:chararray);
DUMP A;

(FITKA,FINVA,FINVU,FEEVA,FETKA,FINVA)

Now I have another dataset:

B = LOAD 'values' AS (f1:chararray, f2:chararray);
Dump B;
(FINVA,0.454535)
(FITKA,0.124411)
(FEEVA,0.123133)

And I would like to get those two dataset joined. I would get corresponding value from dataset B and place that value beside the value from dataset A. So expected output is below:

FITKA 0.123133, FINVA 0.454535 and so on .. 
(They can also be like: FITKA, 0.123133, FINVA, 0.454535 and so on .. )

And then I would be able to multiply values (0.123133 x 0.454535 .. and so on) because they are on the same row now and this is what I want.

Of course I can join column by column but then values appear "end of row" and then I can clean it by using another foreach generate. But, I want some simpler solution without too many joins which may cause performance issues.

Dataset A is text (Sentence in one way..).

So what are my options to achieve this? Any help would be nice.

1

There are 1 answers

5
glefait On

A sentence can be represented as a tuple and contains a bag of tuples (word, count).

Therefore, I suggest you change the way you store your data to the following format:

sentence:tuple(words:bag{wordcount:tuple(word, count)})