Configuring PostgreSQL HA on RHEL 7.0

Question

Configuring PostgreSQL HA on RHEL 7.0

2k views Asked by user179056 At 19 December 2014 at 04:11

We are experiencing a problem configuring PostgreSQL for HA using Corosync and Pacemaker.

crm_mon output is

Last updated: Thu Dec 18 10:24:04 2014
Last change: Thu Dec 18 10:16:30 2014 via crmd on umhtvappdpj05.arqiva.local 
Stack: corosync 
Current DC: umhtvappdpj06.arqiva.local (1) - partition with quorum 
Version: 1.1.10-29.el7-368c726 
2 Nodes configured 
4 Resources configured 


Online: [ umhtvappdpj05.arqiva.local umhtvappdpj06.arqiva.local ] 

Full list of resources: 

 Master/Slave Set: msPostgresql [pgsql] 
     Masters: [ umhtvappdpj06.arqiva.local ] 
     Slaves: [ umhtvappdpj05.arqiva.local ] 
 Resource Group: master-group 
     vip-master (ocf::heartbeat:IPaddr2):       Started umhtvappdpj06.arqiva.local 
     vip-rep    (ocf::heartbeat:IPaddr2):       Started umhtvappdpj06.arqiva.local 

Node Attributes: 
* Node umhtvappdpj05.arqiva.local: 
    + master-pgsql                      : -INFINITY 
    + pgsql-data-status                 : LATEST 
    + pgsql-status                      : HS:alone 
    + pgsql-xlog-loc                    : 0000000097000168 
* Node umhtvappdpj06.arqiva.local: 
    + master-pgsql                      : 1000 
    + pgsql-data-status                 : LATEST 
    + pgsql-master-baseline             : 0000000094000090 
    + pgsql-status                      : PRI 

Migration summary: 
* Node umhtvappdpj05.arqiva.local: 
* Node umhtvappdpj06.arqiva.local:`

Here node 06(umhtvappdpj06.arqiva.local) started as primary and node 05(umhtvappdpj05.arqiva.local) acts as standby but both are not connected.

recovery.conf on node 05
standby_mode = 'on' 
primary_conninfo = 'host=10.52.6.95 port=5432 user=postgres application_name=umhtvappdpj05.arqiva.local keepalives_idle=60 keepalives_interval=5 keepalives_count=5' 
restore_command = 'scp 10.52.6.85:/var/lib/pgsql/pg_archive/%f %p' 
recovery_target_timeline = 'latest'`

Resources created are:

pcs resource create vip-master IPaddr2 \ 
ip="10.52.6.94" \ 
nic="ens192" \ 
cidr_netmask="24" \ 
op start   timeout="60s" interval="0s"  on-fail="restart" \ 
op monitor timeout="60s" interval="10s" on-fail="restart" \ 
op stop    timeout="60s" interval="0s"  on-fail="block" 

pcs resource create vip-rep IPaddr2 \ 
ip="10.52.6.95" \ 
nic="ens192" \ 
cidr_netmask="24" \ 
meta migration-threshold="0" \ 
op start   timeout="60s" interval="0s"  on-fail="stop" \ 
op monitor timeout="60s" interval="10s" on-fail="restart" \ 
op stop    timeout="60s" interval="0s"  on-fail="ignore" 

pcs resource create pgsql ocf:heartbeat:pgsql \ 
pgctl="/usr/pgsql-9.3/bin/pg_ctl" \ 
psql="/usr/pgsql-9.3/bin/psql" \ 
pgdata="/pgdata/data" \ 
rep_mode="sync" \ 
node_list="10.52.6.85 10.52.6.92" \ 
restore_command="scp 10.52.6.85:/var/lib/pgsql/pg_archive/%f %p" \ 
master_ip="10.52.6.95" \ 
primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" \ 
restart_on_promote='true' \ 
op start   timeout="60s" interval="0s"  on-fail="restart" \ 
op monitor timeout="60s" interval="10s" on-fail="restart" \ 
op monitor timeout="60s" interval="9s"  on-fail="restart" role="Master" \ 
op promote timeout="60s" interval="0s"  on-fail="restart" \ 
op demote  timeout="60s" interval="0s"  on-fail="stop" \ 
op stop    timeout="60s" interval="0s"  on-fail="block" \ 
op notify  timeout="60s" interval="0s" 

[root@umhtvappdpj05 data]# pcs resource show --all 
 Master/Slave Set: msPostgresql [pgsql] 
     Masters: [ umhtvappdpj06.arqiva.local ] 
     Slaves: [ umhtvappdpj05.arqiva.local ] 
 Resource Group: master-group 
     vip-master (ocf::heartbeat:IPaddr2):       Started 
     vip-rep    (ocf::heartbeat:IPaddr2):       Started 
[root@umhtvappdpj05 data]# `

The only anomaly was the corosync and pacemaker were first installed on node 6 when it was on a different subnet from node 5. Subsequently node 6 was shifted to same subnet as 5. Could this be the cause? Maybe re-install on node 6. Does seem to make sense.

thank you

Sameer

Original Q&A

There are 1 answers

**Ajay Pratap** · Answer 1 · 2020-02-07T06:58:34+00:00

Here is how you can connect your replica with primary PostgreSQL.

1. Touch PGSQL.lock file  in /var/lib/pgsql/tmp/ in  umhtvappdpj05.arqiva.local
2. Stop PostgreSQL in node umhtvappdpj05.arqiva.local using systemctl
3. Do base backup/rsync of data dir/  from primary server to replica
4. Remove PGSQL.lock file from replica
5. Run pcs resource cleanup pgsql

These steps works for me always. Let me know if doesn't.

TechQA.

Configuring PostgreSQL HA on RHEL 7.0

There are 1 answers

Related Questions in POSTGRESQL

Related Questions in REDHAT

Related Questions in HIGH-AVAILABILITY

Related Questions in PACEMAKER

Popular Questions

Popular Tags

Trending Questions