Hive Tremendous data size increase from converting avro to parquet

Question

Hive Tremendous data size increase from converting avro to parquet

269 views Asked by user2942227 At 16 April 2016 at 14:28

I wanted to convert one days avro data (~2 TB) to parquet.

I ran a hive query and data successfully got converted to parquet.

But the data size became 6 TB.

What would have happened that data became thrice the size?

Original Q&A

There are 1 answers

**Tom Harrison** · Answer 1 · 2016-04-17T14:45:01+00:00

Typically, Parquet can be more efficient than Avro, as it's a columnar format columns of the same type are adjacent on the disk. This allows compression algorithms to be more effective in some cases. Typically we use Snappy which is sufficient, easy on CPU and has several properties that make it suitable for Hadoop relative to other compression methods like zip or gzip. Mainly snappy is splittable; each block retains information necessary to determine schema. MParquet is a great format and we have been very happy with query performance after moving from Avro (and we also can use Impapla which is super-fast).

TechQA.

Hive Tremendous data size increase from converting avro to parquet

There are 1 answers

Related Questions in HADOOP

Related Questions in HIVE

Related Questions in AVRO

Related Questions in PARQUET

Related Questions in DATA-FORMATS

Popular Questions

Trending Questions