mapreduce job not setting compression codec correctly

Question

mapreduce job not setting compression codec correctly

1.2k views Asked by Vikas Saxena At 05 June 2015 at 06:15

Hi I have a MR2 job which takes avro data compressed with snappy as input, processes it and outputs the data into an output dir into avro format. The expectation is that this output avro data should also be snappy compressed but its not. The MR job is a map only job.

I have set the following properties in my code

conf.set("mapreduce.map.output.compress", "true"); conf.set("mapreduce.map.output.compress.codec", "org.apache.hadoop.io.compress.SnappyCodec");

But still the output is not snappy compressed

Original Q&A

There are 3 answers

gwgyk On 05 June 2015 at 07:11

If you want to use snappy, just set parameter org.apache.hadoop.io.compress.SnappyCodec is not enough. You should download the snappy from google and build them, then copy the build files to the hadoop lib directory.

You can search on the google "how to use snappy on hadoop",there is a post, but it was write in Chinese. link

vefthym On 05 June 2015 at 07:31

What you have now is a compression of the intermediate output of the map phase. Instead, you should use the following commands (see this presentation and especially slide 9 for more details):

conf.setOutputFormat(SequenceFileOutputFormat.class);
conf.set("mapreduce.output.fileoutputformat.compress.codec", "org.apache.hadoop.io.compress.SnappyCodec");

or any alternatives you wish, but do not include the word "map" in these configurations, otherwise it will be about the intermediate output.

**Vikas Saxena** · Accepted Answer · 2015-06-19T00:31:48+00:00

Vikas Saxena On 19 June 2015 at 00:31 BEST ANSWER

The following did the trick FileOutputFormat.setCompressOutput(job, true); FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.SnappyCodec.class);

Please note that this has do be done before setting the outputpath and in the same order as shown above.

TechQA.

mapreduce job not setting compression codec correctly

There are 3 answers

Related Questions in HADOOP

Related Questions in MAPREDUCE

Related Questions in HADOOP-YARN

Related Questions in AVRO

Related Questions in SNAPPY

Popular Questions

Popular Tags

Trending Questions