Read Snappy Compressed data on HDFS from Hadoop Streaming

Question

Read Snappy Compressed data on HDFS from Hadoop Streaming

1.7k views Asked by Salias At 07 January 2017 at 16:33

I have a folder in my HDFS system that contains text files compressed using Snappy codec.

Normally, when reading GZIP compressed files in a Hadoop Streaming job, the decompression occurs automatically. However, this is not happening when using Snappy compressed data, and I am not able to process the data.

How can I read these files and process them in Hadoop Streaming?

Many thanks in advance.

UPDATE:

If I use the command hadoop fs -text file it works. The problem only happens when using hadoop streaming, the data is not decompressed before passed to my python script.

Original Q&A

There are 2 answers

rav On 11 January 2017 at 13:37

Do you have snappy codec configured in core-site, like:

<property>
  <name>io.compression.codecs</name>
  <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
</property>

**Salias** · Accepted Answer · 2017-01-17T15:25:40+00:00

I think I have an answer to the problem. It would be great if someone can confirm this.

Browsing the Cloudera blog. I found this article explaining the Snappy codec. As it can be read:

One thing to note is that Snappy is intended to be used with a container format, like Sequence Files or Avro Data Files, rather than being used directly on plain text, for example, since the latter is not splittable and can’t be processed in parallel using MapReduce.

Therefore a file compressed in HDFS using Snappy codec can be read using hadoop fs -text but not in a Hadoop Streaming job (MapReduce).

TechQA.

Read Snappy Compressed data on HDFS from Hadoop Streaming

There are 2 answers

Related Questions in HADOOP

Related Questions in HDFS

Related Questions in HADOOP-STREAMING

Related Questions in COMPRESSION

Related Questions in SNAPPY

Popular Questions

Popular Tags

Trending Questions