stratified sampling in pig?

Question

stratified sampling in pig?

648 views Asked by Scratch At 03 January 2014 at 17:51

Does anyone have an idea of how to make a stratified sampling in pig? (wikipedia)

For the moment, I do something like :

relation2 = SAMPLE relation1 0.05;

but my dataset contains a label columns with a few occurrences, some of them are rare (0.5 % for example) and I would like my random down sampling not to forget all of them.

Thanks a lot.

Original Q&A

There are 1 answers

**reo katoa** · Accepted Answer · 2014-01-03T20:52:35+00:00

You could implement your own method of sampling by using RANDOM() and then filtering out rows with values below, say, 0.95. So, if you want to stratify this sampling, you could compute what fraction of your rows contain a certain value, and then scale your random value accordingly so that different values get sampled at different rates.

TechQA.

stratified sampling in pig?

There are 1 answers

Related Questions in HADOOP

Related Questions in APACHE-PIG

Related Questions in SAMPLING

Related Questions in DOWNSAMPLING

Popular Questions

Trending Questions