Partitioner or MultipleOutputs

Question

Partitioner or MultipleOutputs

210 views Asked by JohnRossy At 30 November 2013 at 05:35

I would like to have your opinion regarding Partitioner vs MultipleOutputs.
Suppose I have a file which contains keys as

0:aaa  
1:bbb  
0:ccc  
0:ddd  
...  
1:zzz

I would like have 2 files: one file containing keys starting with 0: and the other containing keys starting with 1:. Which approach should I use:
1) Use a custom Partitioner which will parse the keys and returns 0 or 1 for getPartition().
2) Use MultipleOutputs.write in the reduce phase, by parsing the key and providing zero or one for the namedOutput parameter of MultipleOutputs.write.

Which one is better? For me, 1) is better because reducers deal with a single file.

Original Q&A

There are 2 answers

Arun Poreddy On 07 August 2014 at 18:17

When you say the first option is better that means you bound by 2 values.. suppose if you get other key value u might need to change your partitioner or cofiguration to set 3 reducers, so better idea is use multipleoutputs

**Chris White** · Accepted Answer · 2013-12-01T21:10:06+00:00

If your job is only to split the input files into 2 parts, then MultipleOutputs is a better bet as you can save on the shuffle / sort phase (by running a map only job).

Now if you have lots of input files and don't want 2x the number of output files as you have input, then using the partitioner based approach will allow you to consolidate the input files into 2 outputs (they won't be nicely named however, another benefit of MultipleOutputs, but you can easily fix this by using MultipleOutputs in your reducer and LaxyOutputFormat to ensure that the empty part-r files won't be written as output).

So to answer - it depends on how many input files you have, and how many output files you want.

TechQA.

Partitioner or MultipleOutputs

There are 2 answers

Related Questions in HADOOP

Related Questions in MAPREDUCE

Related Questions in HADOOP-PARTITIONING

Related Questions in REDUCERS

Related Questions in PARTITIONER

Popular Questions

Popular Tags

Trending Questions