MarkLogic: How to speed up rebalancing process when adding a new forest to an existing database?

186 views Asked by At

Our production MarkLogic DB is having 1.2 TB data divided among 6 forests. We plan to add 2 new forests to reduce stands per forest count.

Now, adding new forests starts rebalancing the data. That's okay, it takes time. But this rebalancing time keeps shooting up whenever merges start alongside rebalancing. Sometimes it takes, estimated 8 hours to suddenly to 16 hours. So, on average the whole process is taking approximately 24 hours.

My question is - If we disable the merge before adding the new forests and enable the manual merge soon after rebalancing completes( after adding forests), would the combined process be faster? And, will it be safe to do this?

2

There are 2 answers

0
asusu On

In addition to the other info provided, assignment policy may affect how much work is done. See for example: https://docs.marklogic.com/guide/admin/database-rebalancing#id_81616 . You can also set the rebalancer throttle to make it work slower if the system is getting overwhelmed. But if you turn off merging while rebalancing I'm going to bet you'll hit a TOOMANYSTANDS error pretty quickly since the small stands will need to be written because of the rebalancer, but won't be able to merge to larger+fewer stands.

1
Mike Gardner On

Anything that affects disk IO will affect the speed of rebalancing, including merging and standard database activity, however care should be taken if you are disabling merging.

The risk of disabling merging, is that you prevent the system from pruning stands, so if too many stands accumulate you may hit the hard limit, which will impact server operation.

If merging is having such a heavy impact, then you can look at tuning the merge configurations. More information can be found in the documentation.