Last modified: January 24, 2026

This article is written in: 🇺🇸

Replication

Replication is a method of maintaining copies of data across multiple nodes in distributed systems, making it useful for improving availability, reducing latency, and distributing load. Below are detailed notes, organized in bullet points, each containing one highlighted word in the middle to emphasize a key concept. Simple ASCII diagrams are included to illustrate how replication can be structured.

|  Client |
            +----+----+
                 |
        Read/Write Requests
                 |
                 v
     +-----------+-----------+     
     |        Leader        |  (Single Leader Replication)
     +-----------+-----------+
                 |  Replication Log
     +-----------+-----------+
     |      Follower(s)     |
     +-----------------------+

Single Leader Replication

Single leader replication designates one node as the leader, which receives all write operations. The followers continuously replicate changes from the leader, ensuring that each follower eventually converges to the same state.

+-----------+           +------------+
|   Leader  |  Log ---> |  Follower  |
+-----+-----+           +------------+
      |                      
      | Log               
      v                      
+-----------+                 
| Follower  |                 
+-----------+

Managing Leader Failure

When the current leader fails or becomes unreachable, the system needs to conduct a failover procedure to select a new leader. This process should be carefully handled to avoid data loss and minimize downtime.

Implementing the Replication Log

Replication logs form the backbone of data propagation from the leader to followers. Two common strategies are statement-based replication (replicating SQL commands) and log-based replication (using a write-ahead log).

|  Leader DB  |
   +------+------+ 
          |  (Log Records) 
          v 
   +-------------+
   | Follower DB |
   +-------------+

Replication Lag and Eventual Consistency

In distributed systems, the delay between a write operation on the leader and its visibility on the followers is known as replication lag. This delay can affect how quickly data converges across nodes, leading to an eventually consistent state if delays are long.

Multi Leader Replication

In a multi-leader setup, each node can accept writes and replicate them to others, making it useful for geographically distributed deployments or cases where local write performance is prioritized. However, handling conflicting writes becomes more challenging.

|Node A |
          +---+---+
              | ^ 
    (Writes)  | |   (Writes)
              v |
          +---+---+
          |Node B |
          +---+---+
              | ^
    (Writes)  | |   (Writes)
              v |
          +---+---+
          |Node C |
          +-------+

Leaderless Replication

Leaderless systems eliminate the concept of a single leader node, allowing any replica to accept writes. Such systems typically rely on a quorum approach to ensure most nodes agree on a given update or read, aiming to maintain consistency without centralized coordination.

Leaderless Model
          +-----+
  Write -> |Node1| <-----
          +-----+       | 
                        |
          +-----+    +-----+
Read  <-  |Node2| <--|Node3|
          +-----+    +-----+