I have an acid enabled, partitioned, bucketed hive table to which I am writing using a streaming client. I see that several delta files are created as the records are written into partitions. I wanted to enable auto-compaction and tried the following base and specific params:
hive.support.concurrency=true
hive.enforce.bucketing=true
hive.exec.dynamic.partition.mode=nonstrict
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.compactor.initiator.on=true
hive.compactor.worker.threads=1
with,
hive.compactor.initiator.on=true
hive.compactor.cleaner.run.interval=5000ms
hive.compactor.delta.num.threshold=10
hive.compactor.delta.pct.threshold=0.1f
hive.compactor.abortedtxn.threshold=1000
hive.compactor.initiator.failed.compacts.threshold=2
hive.compactor.abortedtxn.threshold=1000
I did the above with hopes of enabling major compaction. However I see that major compaction is triggered automatically only once. i.e, Major compaction runs once and creates a base file. Once a base file is created for a number of delta files within that partition, Major compaction is not scheduled further, despite more delta files streamed into the partition since. How do I enable auto-Major compaction for a table? Has anyone faced similar issues before?
I have the same issue and the only solution I found is run manual compaction for each partition.
I still trying to figure out why happens.