HIVE 3.1 - Automatic Major compaction triggered only once per partition

1.4k views Asked by At

I have an acid enabled, partitioned, bucketed hive table to which I am writing using a streaming client. I see that several delta files are created as the records are written into partitions. I wanted to enable auto-compaction and tried the following base and specific params:

    hive.support.concurrency=true 
    hive.enforce.bucketing=true 
    hive.exec.dynamic.partition.mode=nonstrict 
    hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager 
    hive.compactor.initiator.on=true 
    hive.compactor.worker.threads=1

with,

hive.compactor.initiator.on=true 
hive.compactor.cleaner.run.interval=5000ms 
hive.compactor.delta.num.threshold=10 
hive.compactor.delta.pct.threshold=0.1f
hive.compactor.abortedtxn.threshold=1000 
hive.compactor.initiator.failed.compacts.threshold=2 
hive.compactor.abortedtxn.threshold=1000

I did the above with hopes of enabling major compaction. However I see that major compaction is triggered automatically only once. i.e, Major compaction runs once and creates a base file. Once a base file is created for a number of delta files within that partition, Major compaction is not scheduled further, despite more delta files streamed into the partition since. How do I enable auto-Major compaction for a table? Has anyone faced similar issues before?

1

There are 1 answers

0
AlvSel On

I have the same issue and the only solution I found is run manual compaction for each partition.

ALTER TABLE myTable PARTITION (myPartitionColumn='myPartitionValue') COMPACT 'major';

I still trying to figure out why happens.