Errors when recovering postgres cluster with pgbackrest from s3

631 views Asked by At

I want to test restore cluster using backup from s3 via pgbackrest

I have: OpenShift 4.7 PGO Image Tag: centos8-13.2-4.6.2 Postgres Version 13

How to reproduce:

Step1: I install crunchy operator and create cluster:

pgo create cluster example-db \
  --pgbouncer \
  --replica-count=1 \
  --password-superuser="%%%%%" \
  --password-replication="%%%%%" \
  --database test-db \
  --username test-user \
  --password="%%%%%%" \
  --pvc-size 20Gi \
  --pgbackrest-pvc-size 40Gi \
  --metrics \
  --pgbackrest-storage-type=s3 \
  --pgbackrest-s3-key=test-db-backup-rw \
  --pgbackrest-s3-key-secret=%%%%% \
  --pgbackrest-s3-bucket=test-db-backup \
  --pgbackrest-s3-endpoint=s3.my_site.com \
  --pgbackrest-s3-uri-style=path \
  --pgbackrest-s3-verify-tls=false

it's ok.

Step 2: I create backup

pgo backup example-db --backup-opts="--type=full --repo1-retention-full=3 --archive-timeout=300" --pgbackrest-storage-type=s3

That's ok too.

Step 3: I remove cluster (simulate the loss of this cluster).

pgo delete cluster example-db

Backup still remains in s3.

Step 4: I am trying to restore this cluster via creating a standby cluster

pgo create cluster standby-test-db \
  --standby \
  --pgbouncer \
  --replica-count=1 \
  --password-superuser="%%%%%" \
  --password-replication="%%%%" \
  --database test-db \
  --username test-user \
  --password="%%%%" \
  --pvc-size 20Gi \
  --pgbackrest-pvc-size 40Gi \
  --metrics \
  --pgbackrest-storage-type=s3 \
  --pgbackrest-s3-key=test-db-backup-rw \
  --pgbackrest-s3-key-secret=%%%%% \
  --pgbackrest-s3-bucket=test-db-backup \
  --pgbackrest-s3-endpoint=s3.my_site.com \
  --pgbackrest-s3-uri-style=path \
  --pgbackrest-s3-verify-tls=false \
  --pgbackrest-repo-path=/backrestrepo/example-db-backrest-shared-repo

i get a lot of the same warn in my pod:

...
�[0;33mTue Jun 22 16:30:16 UTC 2021 WARN: Detected an earlier failed attempt to initialize�[0m
�[0;32mTue Jun 22 16:30:16 UTC 2021 INFO: Correct the issue, remove '/pgdata/standby-test-db.initializing', and try again�[0m
�[0;32mTue Jun 22 16:30:16 UTC 2021 INFO: Your data might be in: /pgdata/standby-test-db_*�[0m
�[0;33mTue Jun 22 16:30:26 UTC 2021 WARN: Detected an earlier failed attempt to initialize�[0m
�[0;32mTue Jun 22 16:30:26 UTC 2021 INFO: Correct the issue, remove '/pgdata/standby-test-db.initializing', and try again�[0m
�[0;32mTue Jun 22 16:30:26 UTC 2021 INFO: Your data might be in: /pgdata/standby-test-db_*�[0m
...

If i remove /pgdata/standby-test-db_* it won't change messsages If i restart pod - /pgdata/standby-test-db_* will be created again.

I am completely confused with this problem.

1

There are 1 answers

0
enix On

I researched this problem and it turned out that pgbackrest was not finding the correct full backup. I recreated a full backup and the problem was solved.