Last modified: December 05, 2024

This article is written in: 🇺🇸

Consistent Hashing

Imagine you're organizing books in a vast library with shelves arranged in a circle. Each book is placed on a shelf based on its title's position in the alphabet, looping back to the beginning after 'Z'. If you add a new shelf or remove one, you wouldn't want to reshuffle all the books—just a few should need to move. Consistent hashing works similarly in computer systems, allowing data to be distributed across multiple servers efficiently, even as servers are added or removed.

The Hash Ring Concept

Consistent hashing uses a logical ring to represent the entire range of possible hash values. Both data items and nodes (servers) are mapped onto this ring using a hash function.

Visualizing the Hash Ring:

#
                      +---------+
                      |         |
                +-----+   0°    +-----+
                |     |         |     |
                |     +---------+     |
                |                     |
         +------+                     +------+
         |                                    |
    270° +                                    + 90°
         |                                    |
         +------+                     +------+
                |                     |
                |     +---------+     |
                |     |         |     |
                +-----+  180°   +-----+
                      |         |
                      +---------+

Mapping Nodes and Data onto the Ring

Suppose we have three nodes—Node A, Node B, and Node C—and several data keys that need to be stored.

Assigning Nodes to the Ring:

Visual Representation with Nodes:

#
                      +---------+
                      |  Node A |
                +-----+   0°    +-----+
                |     |         |     |
                |     +---------+     |
                |                     |
         +------+                     +------+
         |                                    |
         |                                    |
         |                                    |
         +------+                     +------+
                |                     |
                |     +---------+     |
                |     | Node C  |     |
                +-----+ 240°    +-----+
                      |         |
                      +---------+

Placing Data Items on the Ring:

Let's say we have data items with keys K1, K2, and K3.

Visual Representation with Data:

#
                      +---------+
                      |  Node A |
                +-----+   0°    +-----+
                |     |         |     |
                |     +---------+     |
                |         ↑           |
         +------+        K3           +------+
         |                                    |
         |                                    |
         |                                    |
         +------+                     +------+
                |                     |
                |     +---------+     |
                |     | Node C  |     |
                +-----+ 240°    +-----+
                      |    ↑    |
                      |   K2    |

How Data Assignment Works

In consistent hashing, each data item is assigned to the next node encountered when moving clockwise around the ring.

Complete Ring with Nodes and Data:

#
                      +---------+
                      |  Node A |
                +-----+   0°    +-----+
                |     |    ↑    |     |
                |     +--- K3 ---+     |
                |                     |
         +------+                     +------+
         |                                    |
         |                                    |
         |                                    |
         +------+                     +------+
                |                     |
                |     +---------+     |
                |     | Node C  |     |
                +-----+ 240°    +-----+
                      |    ↑    |
                      |   K2    |

Adding and Removing Nodes

One of the strengths of consistent hashing is that it minimizes the amount of data that needs to move when nodes join or leave the system.

Adding a New Node

Suppose we add Node D that hashes to 80°.

Ring After Adding Node D:

#
                      +---------+
                      |  Node A |
                +-----+   0°    +-----+
                |     |         |     |
                |     +---------+     |
                |                     |
         +------+                     +------+
         |            Node D (80°)            |
         |                ↑                   |
         |               K1                   |
         +------+                     +------+
                |                     |
                |     +---------+     |
                |     | Node C  |     |
                +-----+ 240°    +-----+
                      |    ↑    |
                      |   K2    |

Removing a Node

If Node B leaves the system:

Practical Example: Distributed Caching with Consistent Hashing

Consider a web application using a distributed cache to store session data. Let's see how consistent hashing helps in this scenario.

Without Consistent Hashing

With Consistent Hashing

Virtual Nodes (VNodes)

To enhance data distribution and fault tolerance, consistent hashing often uses virtual nodes.

What Are Virtual Nodes?

Benefits of Virtual Nodes

Implementing Consistent Hashing

Let's walk through how you might implement consistent hashing in practice.

Step 1: Hash Function Selection

Choose a hash function that distributes values uniformly, such as MD5 or SHA-1.

Step 2: Mapping Nodes to the Ring

Assign each node (or virtual node) a hash value to determine its position on the ring.

Step 3: Mapping Keys to Nodes

For each data key:

  1. Compute its hash value.
  2. Locate the first node on the ring whose hash value is greater than or equal to the key's hash.
  3. If no such node exists (the key's hash is higher than any node's hash), wrap around to the first node on the ring.

Sample Code Snippet (Pseudocode)

# Import a reliable hash function
import hashlib

# Function to compute hash value
def compute_hash(key):
    return int(hashlib.sha1(key.encode()).hexdigest(), 16)

# Nodes in the system
nodes = ['NodeA', 'NodeB', 'NodeC']

# Create the ring with virtual nodes
ring = {}
virtual_nodes = 100  # Number of virtual nodes per physical node

for node in nodes:
    for i in range(virtual_nodes):
        vnode_key = f"{node}:{i}"
        hash_val = compute_hash(vnode_key)
        ring[hash_val] = node

# Sort the hash ring
sorted_hashes = sorted(ring.keys())

# Function to find the node for a given key
def get_node(key):
    hash_val = compute_hash(key)
    for node_hash in sorted_hashes:
        if hash_val <= node_hash:
            return ring[node_hash]
    return ring[sorted_hashes[0]]  # Wrap around

# Example usage
key = 'my_data_key'
assigned_node = get_node(key)
print(f"Key '{key}' is assigned to {assigned_node}")

Interpreting the Output

Real-World Applications

Distributed Databases: Apache Cassandra

Distributed Cache: Amazon DynamoDB

Load Balancing: Web Servers

Advantages of Consistent Hashing

Challenges and Considerations

Best Practices

Table of Contents

    Consistent Hashing
    1. The Hash Ring Concept
    2. Mapping Nodes and Data onto the Ring
    3. How Data Assignment Works
    4. Adding and Removing Nodes
      1. Adding a New Node
      2. Removing a Node
    5. Practical Example: Distributed Caching with Consistent Hashing
      1. Without Consistent Hashing
      2. With Consistent Hashing
    6. Virtual Nodes (VNodes)
      1. What Are Virtual Nodes?
      2. Benefits of Virtual Nodes
    7. Implementing Consistent Hashing
      1. Step 1: Hash Function Selection
      2. Step 2: Mapping Nodes to the Ring
      3. Step 3: Mapping Keys to Nodes
      4. Sample Code Snippet (Pseudocode)
      5. Interpreting the Output
    8. Real-World Applications
      1. Distributed Databases: Apache Cassandra
      2. Distributed Cache: Amazon DynamoDB
      3. Load Balancing: Web Servers
    9. Advantages of Consistent Hashing
    10. Challenges and Considerations
    11. Best Practices