Un-nesting nested tuples to single terms

Question

Un-nesting nested tuples to single terms

504 views Asked by Stefanos13 At 07 June 2015 at 12:46

I have written an udf (extends EvalFunc<Tuple>) which has as output tuples with inner tuples (nested).

For example the dump looks like:

(((photo,photos,photo)))
(((wedg,wedge),(audusd,audusd)))
(((quantum,quantum),(mind,mind)))
(((cassi,cassie),(cancion,canciones)))
(((calda,caldas),(nova,novas),(rodada,rodada)))
(((fingerprint,fingerprint),(craft,craft),(easter,easter)))

Now I want to process each of this terms, distinct it and give it an id (RANK). To do this, i need to get rid of the brackets. A simple FLATTENdoes not help in this case.

The final output should be like:

1 photo
2 photos
3 wedg
4 wedge
5 audusd
6 quantum
7 mind
....

My code (not the udf part and not the raw parsing):

tags = FOREACH raw GENERATE FLATTEN(tags) AS tag;
tags_distinct = DISTINCT tags;
tags_sorted = RANK tags_distinct BY tag;
DUMP tags_sorted;

Original Q&A

There are 1 answers

**glefait** · Answer 1 · 2015-06-07T17:59:04+00:00

I think your UDF is return is not optimal for your workflow. Instead of returning a tuple with variable number of fields (which are tuples), it would be a lot more convenient to return a bag of tuples.

Instead of

(((wedg,wedge),(audusd,audusd)))

you will have

({(wedg,wedge),(audusd,audusd)})

and you will be able to FLATTEN that bag to: 1. make the DISTINCT 2. RANK the tags

To do so, update your UDF like this :

class MyUDF extends EvalFunc <DataBag> {

    @Override
    public DataBag exec(Tuple input) throws IOException {
        // create DataBag
    }
}

TechQA.

Un-nesting nested tuples to single terms

There are 1 answers

Related Questions in NESTED

Related Questions in TUPLES

Related Questions in APACHE-PIG

Related Questions in FLATTEN

Related Questions in UDF

Popular Questions

Trending Questions