PostgreSQL - using log shipping to incrementally update a remote read-only slave

3.4k views Asked by At

My company's website uses a PostgreSQL database. In our data center we have a master DB and a few read-only slave DB's, and we use Londiste for continuous replication between them.

I would like to setup another read-only slave DB for reporting purposes, and I'd like this slave to be in a remote location (outside the data center). This slave doesn't need to be 100% up-to-date. If it's up to 24 hours old, that's fine. Also, I'd like to minimize the load I'm putting on the master DB. Since our master DB is busy during the day and idle at night, I figure a good idea (if possible) is to get the reporting slave caught up once each night.

I'm thinking about using log shipping for this, as described on http://www.postgresql.org/docs/8.4/static/continuous-archiving.html

My plan is:

  1. Setup WAL archiving on the master DB
  2. Produce a full DB snapshot and copy it to the remote location
  3. Restore the DB and get it caught up
  4. Go into steady state where:
    • DAYTIME -- the DB falls behind but people can query it
    • NIGHT -- I copy over the day's worth of WAL files and get the DB caught up

Note: the key here is that I only need to copy a full DB snapshot one time. Thereafter I should only have to copy a day's worth of WAL files in order to get the remote slave caught up again.

Since I haven't done log-shipping before I'd like some feedback / advice.

Will this work? Does PostgreSQL support this kind of repeated recovery?

Do you have other suggestions for how to set up a remote semi-fresh read-only slave?

thanks!

--S

2

There are 2 answers

5
Charles On

9.0's built-in WAL streaming replication is designed to accomplish something that should meet your goals -- a warm or hot standby that can accept read-only queries. Have you considered using it, or are you stuck on 8.4 for now?

(Also, the upcoming 9.1 release is expected to include an updated/rewritten version of pg_basebackup, a tool for creating the initial backup point for a fresh slave.)


Update: PostgreSQL 9.1 will include the ability to pause and resume streaming replication with a simple function call on the slave.

1
johnbaum On

Your plan should work.
As Charles says, warm standby is another possible solution. It's supported since 8.2 and has relatively low performance impact on the primary server. Warm Standby is documented in the Manual: PostgreSQL 8.4 Warm Standby

The short procedure for configuring a standby server is as follows. For full details of each step, refer to previous sections as noted.

  1. Set up primary and standby systems as near identically as possible, including two identical copies of PostgreSQL at the same release level.
  2. Set up continuous archiving from the primary to a WAL archive located in a directory on the standby server. Ensure that archive_mode, archive_command and archive_timeout are set appropriately on the primary (see Section 24.3.1).
  3. Make a base backup of the primary server (see Section 24.3.2), and load this data onto the standby.
  4. Begin recovery on the standby server from the local WAL archive, using a recovery.conf that specifies a restore_command that waits as described previously (see Section 24.3.3).

To achieve only nightly syncs, your archive_command should exit with a non-zero exit status during daytime.

Additional Informations:

Postgres Wiki about Warm Standby

Blog Post Warm Standby Setup