Last modified: May 11, 2025
This article is written in: πΊπΈ
Master-Standby replication is a widely adopted database replication topology where a primary database server, known as the master, replicates data to one or more secondary servers called standbys. This setup enhances data availability, fault tolerance, and load balancing within a database system. Standby servers can handle read-only queries and, in case of a master server failure, can be promoted to become the new master, ensuring continuous operation.
To visualize how Master-Standby replication works, consider the following diagram:
#
+---------------------+
| |
| Clients |
| (Write & Read Ops) |
| |
+----------+----------+
|
+--------------+--------------+---------------------------------+
| | |
Write & Read Operations Read Operations Read Operations
| | |
v v v
+----------+-----------+ +---------+----------+ +---------+----------+
| | | | | |
| Master Server | | Standby Server 1 | | Standby Server 2 |
| | | (Read-Only) | | (Read-Only) |
+----------+-----------+ +---------+----------+ +---------+----------+
| Κ Κ
| | |
| Replication Replication
| | |
+-----------------------------+---------------------------------+
In this architecture, the master server handles all write operations, such as inserts, updates, and deletes. The standby servers continuously receive data changes from the master to stay synchronized and can serve read-only queries, offloading read traffic from the master. This arrangement not only improves performance but also provides a failover mechanism in case the master server becomes unavailable.
Master-Standby replication serves several essential purposes in database systems:
Implementing Master-Standby replication offers several benefits:
Despite its advantages, Master-Standby replication presents some challenges:
PostgreSQL offers built-in support for streaming replication, making it a suitable choice for implementing Master-Standby replication. Below is a practical example of how to set up this replication using PostgreSQL.
Topology:
ββββββββββββββββββββββββββββ
β ββββββββββββββββββββββ β
β β Master / Primary β β
β β Node-1 10.0.0.10 β β
β βββββββββββ¬βββββββββββ β
β β β
β (async or synchronous) β
β β β
βββββββββββββββββββββββββΌβββββββββββββ΄ββββββββββββββΌβββββββββββββββββββββ
β β
β ββββββββββββββββββββββ ββββββββββββββββββββββ β
β β Replica-1 β β Replica-2 β β
β β Node-2 10.0.0.11 β Streaming β Node-3 10.0.0.12 β β
β β (Hot-Standby) β <--- Replication ---> β (Hot-Standby) β β
β ββββββββββββββββββββββ ββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Legend β We will call the nodes Node-1 (master), Node-2 (replica-1) and Node-3 (replica-2) throughout. Keep the diagram handy: configuration snippets below reference the IPs exactly as shown.
pg_basebackup
).chrony
/ntpd
; replication breaks if clocks drift.vm.swappiness = 1
, set kernel.shmmax
β₯ shared_buffers, etc.Before setting up physical replication, you need to prepare the primary server (Node-1) by adjusting its configuration to enable write-ahead logging (WAL) streaming and accept connections from standby servers. This section walks you through the required changes to PostgreSQL's configuration files and the creation of a replication role.
I. Config file
As the master, Node-1 must generate and retain sufficient WAL segments for standby servers to consume. Update postgresql.conf
with appropriate settings.
# Enable WAL suitable for physical replication
wal_level = replica # 'logical' if you also need logical slots
# Allow enough WAL sender processes for your standby servers
max_wal_senders = 10 # >= number of stand-bys + maintenance head-room
# Maintain replication slots to prevent WAL removal too early
max_replication_slots = 10 # same logic as above
# Keep enough WAL files to let standbys catch up without recycling
wal_keep_size = 512MB # prevents WAL recycling before stand-bys catch up
# Listen only on the masterβs address (or '*' to trust all)
listen_addresses = '10.0.0.10' # or '*' if all IPs are trusted
# Choose commit behavior: local yields speed; remote_apply ensures sync
synchronous_commit = local # change to 'remote_apply' for synchronous sets
Tip β From v15 onward you can use
min_wal_size
/max_wal_size
instead of a fixedwal_keep_size
to auto-scale WAL retention.
II. pg_hba.conf
Standby servers must be allowed to connect for replication. Add entries to pg_hba.conf
specifying the replication database, user, and host addresses.
# TYPE DATABASE USER ADDRESS METHOD
host replication replicator 10.0.0.11/32 md5
host replication replicator 10.0.0.12/32 md5
III. Create the replication role
A dedicated role with replication privileges is required for streaming WAL to standbys. Execute the following SQL on the master:
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'S3cUr3P@ss!';
IV. Reload / restart Node-1
After updating configuration files and creating the role, reload or restart PostgreSQL to apply changes.
sudo systemctl restart postgresql
Once the master is prepared, each standby (Node-2 and Node-3) needs to be bootstrapped from a base backup and configured to stream WAL. You will perform the same steps on both replicas, adjusting only the connection details as needed.
I. Stop PostgreSQL
Ensure PostgreSQL is not running on the replica before taking the base backup or restoring data.
sudo systemctl stop postgresql
II. Take a base-backup from the master
Use pg_basebackup
to clone the primaryβs data directory, stream WAL, and automatically write recovery settings.
sudo -u postgres pg_basebackup \
-h 10.0.0.10 \
-D /var/lib/pgsql/15/data \
-U replicator \
-W -P --wal-method=stream --write-recovery-conf
The --write-recovery-conf
switch auto-creates standby.signal
and fills in primary_conninfo
.
III. (If you skipped --write-recovery-conf
) edit postgresql.conf
Manually enable hot standby mode and configure connection to the master, specifying the replication slot unique to each standby.
hot_standby = on
primary_conninfo = 'host=10.0.0.10 port=5432 user=replicator password=S3cUr3P@ss!'
primary_slot_name = 'node2_slot' # unique per standby (see slots below)
If needed, create the standby signal file to trigger recovery mode (PostgreSQL v12+):
touch /var/lib/pgsql/15/data/standby.signal
IV. Start the replica
With configuration in place, start PostgreSQL on the standby to begin streaming WAL from the master.
sudo systemctl start postgresql
After configuring both master and standbys, you should verify that WAL streaming is active and data changes propagate as expected. This section covers querying replication status and performing a simple functional test.
I. On Node-1
Check the replication status view on the master to confirm standbys are connected and streaming.
SELECT client_addr, state, sync_state
FROM pg_stat_replication;
You should see rows for 10.0.0.11 and 10.0.0.12 with state = streaming
. If you configured synchronous replication (synchronous_standby_names
, synchronous_commit = remote_apply
) the sync_state
column will be sync
for the chosen stand-bys.
II. Functional test
Perform a simple create-and-read test to confirm that changes on the master appear on the replica almost immediately.
-- Master (10.0.0.10)
CREATE TABLE replication_test (id SERIAL PRIMARY KEY, data TEXT);
INSERT INTO replication_test (data) VALUES ('Hello replicas');
-- Replica (10.0.0.11 or .12)
SELECT * FROM replication_test; -- should return 1 row almost instantly
In a production environment, you need a plan for promoting a standby to master when the primary fails. This section outlines a manual fail-over process, though automation tools can streamline detection and promotion.
I. Detect failure of Node-1 (often automated with Patroni, pg_auto_failover, etc.).
II. Promote Node-2 (example)
sudo -u postgres pg_ctl promote -D /var/lib/pgsql/15/data
or
touch /var/lib/pgsql/15/data/promote.signal
III. Redirect applications to the new master (10.0.0.11).
IV. Re-configure the old master (once repaired) as a replica: wipe its data dir, repeat the βReplicaβ steps, giving it a new slot (node1_as_replica_slot
).
Replication slots ensure that WAL segments needed by a standby are retained on the master until they have been safely replayed. This prevents standbys from falling too far behind and losing data.
On Node-1:
SELECT pg_create_physical_replication_slot('node2_slot');
SELECT pg_create_physical_replication_slot('node3_slot');
Then, on each replicaβs postgresql.conf
, match the slot:
primary_slot_name = 'node2_slot' # or node3_slot accordingly
Slots guarantee that WAL remains available until every replica has replayed itβpreventing the dreaded βrequested WAL segment has already been removedβ.
Beyond basic replication, additional configuration can improve resilience, disaster recovery capabilities, and performance. Consider these settings as part of a hardened, high-availability setup:
Setting | Why it matters (refer to ASCII diagram for scope) |
checkpoint_timeout = 15min |
Longer intervals reduce I/O on Node-1; ensure replicas can catch up. |
archive_mode = on + archive_command |
Off-site WAL shipping for disaster recovery beyond Node-2 & Node-3. |
hot_standby_feedback = on |
Stops long-running queries on replicas from causing bloat on master. |
backup_label & recovery_target |
For point-in-time restores (PITR) if required. |