How to create small files while inserting data to hive ORC table using TEZ

560 views Asked by At

I have tried few options but I have only seen config settings to merge small files to big files like below but not vice versa.I am looking to create files of size 150kb .

set hive.merge.tezfiles=true;
set hive.merge.smallfiles.avgsize=128000;
set hive.merge.size.per.task=128000;
2

There are 2 answers

0
Chetna C On

You can try setting ORC block size hive.exec.orc.default.block.size. Also to skip merging of small files, you will need to disable flag. set hive.merge.tezfiles=false; You can refer Hortonworks community thread link for more information on how files are generated.

0
Chetna C On

Try below settings, these should help in keeping small files:

set hive.merge.tezfiles=true;
set hive.merge.smallfiles.avgsize=128000;
set hive.merge.size.per.task=128000;
set mapreduce.input.fileinputformat.split.minsize=100;
set mapreduce.input.fileinputformat.split.maxsize=128000;
set hive.exec.orc.default.block.size=128000;