How to uncompress file while loading from HDFS to S3?

Question

How to uncompress file while loading from HDFS to S3?

262 views Asked by Vishrant At 20 May 2020 at 20:02

I have csv files in lzo format in HDFS I would like to load these files in to s3 and then to snowflake, as snowflake does not provides lzo compression for csv file format, I am required to convert it on the fly while loading these files to s3.

Original Q&A

There are 2 answers

**Rich Murnane** · Answer 1 · 2020-05-20T21:00:06+00:00

Rich Murnane On 20 May 2020 at 21:00

You can consider using a Lambda function that decompresses the files upon landing on s3, here is a link that gets you there:

https://medium.com/@johnpaulhayes/how-extract-a-huge-zip-file-in-an-amazon-s3-bucket-by-using-aws-lambda-and-python-e32c6cf58f06

**Vishrant** · Answer 2 · 2020-05-21T19:10:10+00:00

This answer helped me to convert from .lzo_deflate to required snowflake compatible output format:

hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming-0.20.2-cdh3u2.jar \
  -Dmapred.output.compress=true \
  -Dmapred.compress.map.output=true \
  -Dmapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec \
  -Dmapred.reduce.tasks=0 \
  -input <input-path> \
  -output $OUTPUT \
  -mapper "cut -f 2"

TechQA.

How to uncompress file while loading from HDFS to S3?

There are 2 answers

Related Questions in HADOOP

Related Questions in SNOWFLAKE-CLOUD-DATA-PLATFORM

Related Questions in DISTCP

Related Questions in S3DISTCP

Popular Questions

Trending Questions