Setting up replication in PostgreSQL is somewhat finicky. Luckily, repmgr can help you with that. This is surprising to some, given that repmgr is a HA tool. However, the HA part is totally optionally. You can use the repmgr CLI to setup, switchover and monitor the replication, without ever starting repmgrd for automatic failover. I mansplained this on mastodon, but it turned out to be still tricky to set up. Hence this post.
To follow along, you need two machines, which can talk to each other on tcp/22 and tcp/5432. My machines are running the Debian 12 image from Hetzner Cloud:
- s1 with the IP
2a01:4f9:c012:a320::1 - s2 with the IP
2a01:4f9:c012:5d90::1
On s1:
sudo apt update
sudo apt upgrade -y
sudo apt install postgresql-15 postgresql-15-repmgr -y
cat << EOF > /etc/repmgr.conf
node_id=1
node_name=s1
conninfo='host=2a01:4f9:c012:a320::1 user=repmgr dbname=repmgr port=5432'
data_directory='/var/lib/postgresql/15/main/'
pg_bindir='/usr/lib/postgresql/15/bin/'
log_file='/var/log/repmgr.log'
service_start_command = 'sudo systemctl start postgresql@15-main.service'
service_stop_command = 'sudo systemctl stop postgresql@15-main.service'
service_restart_command = 'sudo systemctl restart postgresql@15-main.service'
service_reload_command = 'sudo systemctl reload postgresql@15-main.service'
EOF
cat << EOF > /etc/sudoers.d/repmgr
Defaults:postgres !requiretty
postgres ALL = NOPASSWD: /usr/bin/systemctl stop postgresql@15-main.service, \
/usr/bin/systemctl start postgresql@15-main.service, \
/usr/bin/systemctl restart postgresql@15-main.service, \
/usr/bin/systemctl reload postgresql@15-main.service
EOF
cat << EOF >> /etc/postgresql/15/main/pg_hba.conf
host repmgr repmgr 2a01:4f9:c012:5d90::1/128 trust
host replication repmgr 2a01:4f9:c012:5d90::1/128 trust
host repmgr repmgr 2a01:4f9:c012:a320::1/128 trust
EOF
cat << EOF > /etc/postgresql/15/main/conf.d/custom.conf
listen_addresses = '*'
wal_log_hints = on
EOF
su - postgres << EOF
ssh-keygen -t ed25519 -q -f ~/.ssh/id_ed25519 -N ""
ssh-keyscan 2a01:4f9:c012:5d90::1 >> ~/.ssh/known_hosts
cat ~/.ssh/id_ed25519.pub
EOF
Note the public key in the last line of the output.
On s2 run the following, fill in the public key you noted down:
sudo apt update
sudo apt upgrade -y
sudo apt install postgresql-15 postgresql-15-repmgr -y
cat << EOF > /etc/repmgr.conf
node_id=2
node_name=s2
conninfo='host=2a01:4f9:c012:5d90::1 user=repmgr dbname=repmgr port=5432'
data_directory='/var/lib/postgresql/15/main/'
pg_bindir='/usr/lib/postgresql/15/bin/'
log_file='/var/log/repmgr.log'
service_start_command = 'sudo systemctl start postgresql@15-main.service'
service_stop_command = 'sudo systemctl stop postgresql@15-main.service'
service_restart_command = 'sudo systemctl restart postgresql@15-main.service'
service_reload_command = 'sudo systemctl reload postgresql@15-main.service'
EOF
cat << EOF > /etc/sudoers.d/repmgr
Defaults:postgres !requiretty
postgres ALL = NOPASSWD: /usr/bin/systemctl stop postgresql@15-main.service, \
/usr/bin/systemctl start postgresql@15-main.service, \
/usr/bin/systemctl restart postgresql@15-main.service, \
/usr/bin/systemctl reload postgresql@15-main.service
EOF
cat << EOF >> /etc/postgresql/15/main/pg_hba.conf
host repmgr repmgr 2a01:4f9:c012:a320::1/128 trust
host replication repmgr 2a01:4f9:c012:a320::1/128 trust
host repmgr repmgr 2a01:4f9:c012:5d90::1/128 trust
EOF
cat << EOF > /etc/postgresql/15/main/conf.d/custom.conf
listen_addresses = '*'
wal_log_hints = on
EOF
su - postgres << EOF
ssh-keygen -t ed25519 -q -f ~/.ssh/id_ed25519 -N ""
ssh-keyscan 2a01:4f9:c012:a320::1 >> ~/.ssh/known_hosts
echo '<PUBLIC_KEY_YOU_NOTED_DOWN>' > ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
cat ~/.ssh/id_ed25519.pub
EOF
Again, note down the key output in the last line.
On s1 run the following, fill in the public key you noted down:
su - postgres << EOF
echo '<PUBLIC_KEY_YOU_NOTED_DOWN>' > ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
EOF
Finally done with all the boilerplate. Tell repmgr to register s1 as the primary:
su - postgres << EOF
createuser -s repmgr
createdb repmgr -O repmgr
sudo systemctl restart postgresql@15-main.service
sleep 5
repmgr primary register
repmgr cluster show
EOF
This should give you something like:
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1 | s1 | primary | * running | | default | 100 | 1 | host=2a01:4f9:c012:a320::1 user=repmgr dbname=repmgr port=5432
Now use repmgr on s2 to setup streaming replication from s1 to s2:
su - postgres << EOF
createuser -s repmgr
createdb repmgr -O repmgr
sudo systemctl stop postgresql@15-main.service
sleep 5
repmgr standby clone -h 2a01:4f9:c012:a320::1 -U repmgr -d repmgr -p 5432 -F
sudo systemctl start postgresql@15-main.service
repmgr standby -F register
repmgr cluster show
EOF
This should give you something like:
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1 | s1 | primary | * running | | default | 100 | 1 | host=2a01:4f9:c012:a320::1 user=repmgr dbname=repmgr port=5432
2 | s2 | standby | running | s1 | default | 100 | 1 | host=2a01:4f9:c012:5d90::1 user=repmgr dbname=repmgr port=5432
Finally let’s try to switchover the primary from s1 to s2:
sudo -u postgres repmgr standby switchover
sudo -u postgres repmgr cluster show
Which should look something like this:
root@s2:~# sudo -u postgres repmgr standby switchover
NOTICE: executing switchover on node "s2" (ID: 2)
NOTICE: attempting to pause repmgrd on 2 nodes
NOTICE: local node "s2" (ID: 2) will be promoted to primary; current primary "s1" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "s1" (ID: 1)
NOTICE: issuing CHECKPOINT on node "s1" (ID: 1)
DETAIL: executing server command "sudo systemctl stop postgresql@15-main.service"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/3007660
NOTICE: promoting standby to primary
DETAIL: promoting server "s2" (ID: 2) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "s2" (ID: 2) was successfully promoted to primary
NOTICE: node "s2" (ID: 2) promoted to primary, node "s1" (ID: 1) demoted to standby
NOTICE: switchover was successful
DETAIL: node "s2" is now primary and node "s1" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully
root@s2:~# sudo -u postgres repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
1 | s1 | standby | running | s2 | default | 100 | 1 | host=2a01:4f9:c012:a320::1 user=repmgr dbname=repmgr port=5432
2 | s2 | primary | * running | | default | 100 | 2 | host=2a01:4f9:c012:5d90::1 user=repmgr dbname=repmgr port=5432
Some exercises left to the reader:
- Use a VPN like WireGuard for transport layer security
- Don’t trust IPs in pg_hba.conf, use proper authentication instead
- Use barman for backups
- Use repmgrd for automatic failover
- Use a loadbalancer like haproxy to have a stable address to connect to either the primary or one of the secondary DBs via health checks
- Monitor replication health, e.g. via the replication lag in
repmgr node status