LZO files issue on S3

Question

LZO files issue on S3

192 views Asked by user1570824 At 04 December 2015 at 10:52

I have 3 LZO compressed files and their corresponding index files in HDFS.

Permission  Owner   Group   Size    Replication Block Size  Name
-rw-r--r--  alum    supergroup  0 B 3   128 MB  _SUCCESS
-rw-r--r--  alum    supergroup  192.29 MB   3   128 MB  part-00000.lzo
-rw-r--r--  alum    supergroup  89.56 KB    3   128 MB  part-00000.lzo.index
-rw-r--r--  alum    supergroup  243.09 MB   3   128 MB  part-00001.lzo
-rw-r--r--  alum    supergroup  106.67 KB   3   128 MB  part-00001.lzo.index
-rw-r--r--  alum    supergroup  163.99 MB   3   128 MB  part-00002.lzo
-rw-r--r--  alum    supergroup  70.54 KB    3   128 MB  part-00002.lzo.index

We copied these files to Amazon S3 and created Hive external table for analytics.

Here are the problems that we are facing,

1) LZO index files are also being treated as data files and meaningless data appears in hive tables
2) "count(*)" query on the table spans only 4 mappers. Indicating problem in splitting.

Could you please let me whats going on S3? It works fine in our YARN cluster.

Original Q&A

There are 1 answers

**Durga Viswanath Gadiraju** · Answer 1 · 2015-12-04T11:46:59+00:00

Durga Viswanath Gadiraju On 04 December 2015 at 11:46

s3 is treated differently than HDFS. Split logic need not be applied as in HDFS. Remember s3 is cloud storage where as HDFS is not local storage. Your files will not be in the form of blocks in s3. This behavior is expected.

TechQA.

LZO files issue on S3

There are 1 answers

Related Questions in HADOOP

Related Questions in AMAZON-WEB-SERVICES

Related Questions in AMAZON-S3

Related Questions in HIVE

Related Questions in HADOOP-LZO

Popular Questions

Popular Tags

Trending Questions