Last modified: May 06, 2025
This article is written in: đşđ¸
Imagine you're organizing books in a vast library with shelves arranged in a circle. Each bookâs position is chosen by the first letter of its title, looping back to the beginning after Z. When you install a new shelf or remove one, youâd prefer not to reshuffle every bookâonly a small, predictable subset should move. Consistent hashing gives distributed computer systems that same flexibility: data spreads evenly across servers, yet when servers come or go only a fraction of keys are remapped.
After working through this material you should be able to answer:
Consistent hashing imagines the entire hash space as a circle.
A typical implementation uses a 32-bit hash (0 â $2^{32} - 1$); after the last value the count âwrapsâ back to 0, so the ends meet like the edges of a clock dial.
Hash Ring:
0° (hash 0 or 2³²-1)
â
â
â
270° (ž¡2³²) ââââââźâââââ 90° (Ÿ¡2³²)
â
â
â
180° (½¡2³²)
Assume three servers âNode A, Node B and Node C.
Node | Hash angle | Covers hash interval* |
Node A | 0° | (240° â 0°] |
Node B | 120° | (0° â 120°] |
Node C | 240° | (120° â 240°] |
Interval is open at the start and closed at the end, so each key maps to exactly one node.
Key | Hash angle | Stored on⌠|
K1 | 100° | Node B |
K2 | 200° | Node C |
K3 | 330° | Node A (0°) â wraps past 360° |
A major advantage of consistent hashing is that only the keys that lie in the affected arc move when the node set changes.
To smooth out load in clusters with uneven node counts or heterogeneous hardware, each physical server is hashed many times, producing virtual nodes scattered around the ring. Requests are balanced across those VNodes, so even if one physical machine is temporarily overloaded only a small slice of the hashâspace suffers.
Consider a web application using a distributed cache to store session data. Let's see how consistent hashing helps in this scenario.
server = hash(key) % number_of_servers
.number_of_servers
, causing most keys to remap to different servers.To enhance data distribution and fault tolerance, consistent hashing often uses virtual nodes.
Let's walk through how you might implement consistent hashing in practice.
Choose a hash function that distributes values uniformly, such as MD5 or SHA-1.
Assign each node (or virtual node) a hash value to determine its position on the ring.
For each data key:
# Import a reliable hash function
import hashlib
# Function to compute hash value
def compute_hash(key):
return int(hashlib.sha1(key.encode()).hexdigest(), 16)
# Nodes in the system
nodes = ['NodeA', 'NodeB', 'NodeC']
# Create the ring with virtual nodes
ring = {}
virtual_nodes = 100 # Number of virtual nodes per physical node
for node in nodes:
for i in range(virtual_nodes):
vnode_key = f"{node}:{i}"
hash_val = compute_hash(vnode_key)
ring[hash_val] = node
# Sort the hash ring
sorted_hashes = sorted(ring.keys())
# Function to find the node for a given key
def get_node(key):
hash_val = compute_hash(key)
for node_hash in sorted_hashes:
if hash_val <= node_hash:
return ring[node_hash]
return ring[sorted_hashes[0]] # Wrap around
# Example usage
key = 'my_data_key'
assigned_node = get_node(key)
print(f"Key '{key}' is assigned to {assigned_node}")
'my_data_key'
is assigned to a node based on its hash value.
System | Why uses consistent hashing | Notable wrinkles |
Amazon Dynamo / DynamoDB | Onâline shopping carts must survive node loss during Black Friday | Requires antiâentropy sync to handle âsloppyâ quorums |
Apache Cassandra | Ring topology underpins token ranges for partitions | Hot partitions can appear if token assignment is careless |
Memcached client libraries | Keeps cacheâhit rate high during live scaling | Need clientâside agreement on hash function |
CDNs (e.g., Akamai) | Predictable client â edgeâserver mapping | Must integrate geoârouting and healthâchecks |
Implementation challenges include choosing a nonâbiased hash function, deciding how many VNodes per server, and coordinating ring state changes across clients.