I've got a setup with a SOLR main and a couple of client nodes (SOLR 5.2.1); The clients are set to replicate every hour. Every now and then, the auto-replication breaks: some nodes stop picking up the main index. I see the index version difference in the UI, so it can reach the main node. I can also start the replication manually ("replicate now" works), so there is no disk space or other communication issue. But it just doesnt replicate by itself anymore.
When this happens, the countdown to next replication poll keeps resetting: if I refresh the page, it starts with 1:00:00 again. This doesn't happen on instances that do not suffer from this problem. I believe sometimes this issue goes away after a "replicate now" and they start replicating again. I don't see anything out of the ordinary in the logs.
What would cause that replication timer to reset in the UI? How can we troubleshoot why it isn't even attempting to replicate by itself, even though it suggests it will in 1h?
The child-nodes replication setting include a configuration where they could also act as a parent to other nodes. Even though this shouldn't affect any nodes, simply removing that section from the child config and reloading the core makes them auto-replicate again.
offending section that is now commented: