MarkLogic scheduled backup fails because backup operation already in progress

249 views Asked by At

We are running a 3-node cluster (ML 8.0-5) on EC2 (centOS servers). We have nightly backups to S3 scheduled for all the databases.

For some of the databases, such as Security, I see the following error in the log:

2016-12-23 05:00:08.820 Info: Starting 2-forest database backup to s3://[bucket]/Security/20161223-0500088204840, jobid=17882056419810225406 (Daily event scheduled every 1 day at T05:00:00.0Z)

2016-12-23 05:00:08.821 Error: 2-forest database backup to s3://[bucket]/Security/20161223-0500088204840, jobid=17882056419810225406, timestamp=18446744073709551615 failed: XDMP-FORESTOPIN: Forest Security has a backup operation in progress

I checked s3, and the backup for the Security db was not written to the bucket on 12/23, although the backup for that database succeeded on 12/22. Additionally, backups for other databases succeeded on 12/23.

The "database-status" page in the admin console for Security shows that the last backup ran last night (2016-12-23T05:01:01.863573Z).

Additionally, the "forest-status" page for the Security forest shows that it was last backed up last night ( December 23, 2016 5:01:02 AM).

This has been affecting all of the ancillary databases (Schemas, Documents, Modules) for at least the last two weeks.

Any ideas what may cause this? I can always open a ticket with support, but since these are ancillary databases and so not as critical, wanted to check here first if there was something obvious.

BTW: This may be related to Marklogic scheduled backups failing, but I couldn't tell because that question didn't provide enough details for comparison.

1

There are 1 answers

1
nosqldev On BEST ANSWER

I realized that the answer is covered in the MarkLogic Knowledgebase here: https://help.marklogic.com/Knowledgebase/Article/View/204/0/best-practicies-when-backing-up-multiple-databases-simultaneously

In this case, all the databases were scheduled for backups at the same time and their configurations included the auxiliary databases. In other words the Security, Schemas, and Documents databases ran backups at 5 am and included a backup of the Security database. This caused a conflict since the Security database was likely to already be in a "backing up" condition.

I was able to replicate the situation on a local MarkLogic instance by scheduling a backup of the Security and Schemas database for the same time period while accepting the default configuration of backing up all the auxiliary databases. I could avoid the error by NOT accepting the default option of including the auxiliary databases in the backup.