Ok, I am trying to understand postgres HA using etcd and patroni. Here is my setup.
postgres node 1:10.0.10.225 psotgres node 2:10.0.10.24 etcd server:10.0.10.67 haproxy server:10.0.10.88
On the postgres server patroni is setup like this.
cope: postgres
namespace: /db/
name: pg-node1
restapi:
listen: 10.0.10.225:8008
connect_address: 10.0.10.225:8008
etcd:
host: 10.0.10.67:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 127.0.0.1/32 md5
- host replication replicator 10.0.10.225/0 md5
- host replication replicator 10.0.10.24/0 md5
- host all all 0.0.0.0/0 md5
users:
admin:
password: admin
options:
- createrole
- createdb
postgresql:
listen: 10.0.10.225:5432
connect_address: 10.0.10.225:5432
data_dir: /data/patroni
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: "mypassword"
superuser:
username: suksh
password: "myotherpassword"
parameters:
unix_socket_directories: '.'
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
On my etcd server I have this configuration:
# This is the configuration file for the etcd server.
# Human-readable name for this member.
name: 'default'
# Path to the data directory.
data-dir: /var/lib/etcd
# Path to the dedicated wal directory.
wal-dir:
# Number of committed transactions to trigger a snapshot to disk.
snapshot-count: 10000
# Time (in milliseconds) of a heartbeat interval.
heartbeat-interval: 100
# Time (in milliseconds) for an election to timeout.
election-timeout: 1000
# Raise alarms when backend size exceeds the given quota. 0 means use the
# default quota.
quota-backend-bytes: 0
# List of comma separated URLs to listen on for peer traffic.
listen-peer-urls: http://10.0.10.67:2380,http://127.0.0.1:7001
# List of comma separated URLs to listen on for client traffic.
listen-client-urls: http://127.0.0.1:2379, http://10.0.10.67:2379
# Maximum number of snapshot files to retain (0 is unlimited).
max-snapshots: 5
# Maximum number of wal files to retain (0 is unlimited).
max-wals: 5
# Comma-separated white list of origins for CORS (cross-origin resource sharing).
cors:
# List of this member's peer URLs to advertise to the rest of the cluster.
# The URLs needed to be a comma-separated list.
initial-advertise-peer-urls: http://10.0.10.67:2380
# List of this member's client URLs to advertise to the public.
# The URLs needed to be a comma-separated list.
advertise-client-urls: http://10.0.10.67:2379
# Discovery URL used to bootstrap the cluster.
discovery:
# Valid values include 'exit', 'proxy'
discovery-fallback: 'proxy'
# HTTP proxy to use for traffic to discovery service.
discovery-proxy:
# DNS domain used to bootstrap initial cluster.
discovery-srv:
# Comma separated string of initial cluster configuration for bootstrapping.
# Example: initial-cluster: "infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380"
initial-cluster:
# Initial cluster token for the etcd cluster during bootstrap.
initial-cluster-token: 'etcd-cluster'
# Initial cluster state ('new' or 'existing').
initial-cluster-state: 'new'
# Reject reconfiguration requests that would cause quorum loss.
strict-reconfig-check: false
# Enable runtime profiling data via HTTP server
enable-pprof: true
# Valid values include 'on', 'readonly', 'off'
proxy: 'off'
# Time (in milliseconds) an endpoint will be held in a failed state.
proxy-failure-wait: 5000
# Time (in milliseconds) of the endpoints refresh interval.
proxy-refresh-interval: 30000
# Time (in milliseconds) for a dial to timeout.
proxy-dial-timeout: 1000
# Time (in milliseconds) for a write to timeout.
proxy-write-timeout: 5000
# Time (in milliseconds) for a read to timeout.
proxy-read-timeout: 0
client-transport-security:
# Path to the client server TLS cert file.
cert-file:
# Path to the client server TLS key file.
key-file:
# Enable client cert authentication.
client-cert-auth: false
# Path to the client server TLS trusted CA cert file.
trusted-ca-file:
# Client TLS using generated certificates
auto-tls: false
peer-transport-security:
# Path to the peer server TLS cert file.
cert-file:
# Path to the peer server TLS key file.
key-file:
# Enable peer client cert authentication.
client-cert-auth: false
# Path to the peer server TLS trusted CA cert file.
trusted-ca-file:
# Peer TLS using generated certificates.
auto-tls: false
# The validity period of the self-signed certificate, the unit is year.
self-signed-cert-validity: 1
# Enable debug-level logging for etcd.
log-level: debug
logger: zap
# Specify 'stdout' or 'stderr' to skip journald logging even when running under systemd.
log-outputs: [stderr]
# Force to create a new one member cluster.
force-new-cluster: false
auto-compaction-mode: periodic
auto-compaction-retention: "1"
# Limit etcd to a specific set of tls cipher suites
cipher-suites: [
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
]
# Limit etcd to specific TLS protocol versions
tls-min-version: 'TLS1.2'
tls-max-version: 'TLS1.3'
when I start the patroni service this.
Mar 01 20:37:57 pg-node1 patroni[2005]: 2024-03-01 20:37:57,811 ERROR: Failed to get list of machines from http://10.0.10.67:2379/v2: EtcdException('Bad response : 404 page not found\n')
Mar 01 20:37:57 pg-node1 patroni[2005]: 2024-03-01 20:37:57,811 INFO: waiting on etcd
Mar 01 20:38:02 pg-node1 patroni[2005]: 2024-03-01 20:38:02,812 ERROR: Failed to get list of machines from http://10.0.10.67:2379/v2: EtcdException('Bad response : 404 page not found\n')
Mar 01 20:38:02 pg-node1 patroni[2005]: 2024-03-01 20:38:02,813 INFO: waiting on etcd
Mar 01 20:38:07 pg-node1 patroni[2005]: 2024-03-01 20:38:07,814 ERROR: Failed to get list of machines from http://10.0.10.67:2379/v2: EtcdException('Bad response : 404 page not found\n')
Mar 01 20:38:07 pg-node1 patroni[2005]: 2024-03-01 20:38:07,814 INFO: waiting on etcd
Mar 01 20:38:12 pg-node1 patroni[2005]: 2024-03-01 20:38:12,816 ERROR: Failed to get list of machines from http://10.0.10.67:2379/v2: EtcdException('Bad response : 404 page not found\n')
Mar 01 20:38:12 pg-node1 patroni[2005]: 2024-03-01 20:38:12,816 INFO: waiting on etcd
Mar 01 20:38:17 pg-node1 patroni[2005]: 2024-03-01 20:38:17,817 ERROR: Failed to get list of machines from http://10.0.10.67:2379/v2: EtcdException('Bad response : 404 page not found\n')
Mar 01 20:38:17 pg-node1 patroni[2005]: 2024-03-01 20:38:17,817 INFO: waiting on etcd
What have i setup wrong here? Thanks in advance.