Consistent hashing maps both servers and keys onto a circular hash ring (0 to 2^32-1). Each key is assigned to the nearest server clockwise. When a server is added or removed, only K/N keys move (K = total keys, N = total servers), instead of rehashing everything. Virtual nodes solve uneven distribution by placing each physical server at multiple ring positions. Used by DynamoDB, Cassandra, Memcached, and CDNs.
You have 5 cache servers. They are handling 10 million keys nicely. Then one server crashes.
With traditional hashing, almost every key now maps to a different server. Your cache hit rate drops to near zero. All 10 million keys effectively need to be re-fetched from the database. Your database gets slammed.
This is the problem consistent hashing solves. When that server crashes, only about 2 million keys (the ones that belonged to that server) need to be redistributed. The other 8 million keys stay exactly where they are. No cache stampede. No database meltdown.
Consistent hashing is one of those techniques that shows up everywhere in distributed systems. DynamoDB, Cassandra, Memcached, CDNs, load balancers. Once you understand how it works, you will see it in almost every system design discussion.
The Problem: Traditional Hashing Breaks When Servers Change
Let’s say you have 4 cache servers and you want to distribute keys across them. The simple approach is:
1
server = hash(key) % number_of_servers
With 4 servers:
| Key | hash(key) | hash(key) % 4 | Server |
|---|---|---|---|
| user:101 | 78234 | 2 | Server 2 |
| user:102 | 91456 | 0 | Server 0 |
| user:103 | 33219 | 3 | Server 3 |
| user:104 | 45678 | 2 | Server 2 |
| user:105 | 12345 | 1 | Server 1 |
This works fine. Until it doesn’t.
What Happens When a Server Goes Down
Server 3 crashes. Now you have 3 servers. The formula changes to hash(key) % 3:
| Key | hash(key) | hash(key) % 3 | New Server | Moved? |
|---|---|---|---|---|
| user:101 | 78234 | 0 | Server 0 | Yes |
| user:102 | 91456 | 2 | Server 2 | Yes |
| user:103 | 33219 | 0 | Server 0 | Yes |
| user:104 | 45678 | 0 | Server 0 | Yes |
| user:105 | 12345 | 0 | Server 0 | Yes |
Every single key now maps to a different server. Your cache hit rate drops from ~100% to close to 0%.
graph LR
subgraph "Before: 4 Servers"
K1["user:101"] --> S2a["Server 2"]
K2["user:102"] --> S0a["Server 0"]
K3["user:103"] --> S3a["Server 3"]
K4["user:104"] --> S2a
K5["user:105"] --> S1a["Server 1"]
end
style S3a fill:#fee2e2,stroke:#dc2626,stroke-width:2px
style S0a fill:#dbeafe,stroke:#3b82f6,stroke-width:2px
style S1a fill:#dbeafe,stroke:#3b82f6,stroke-width:2px
style S2a fill:#dbeafe,stroke:#3b82f6,stroke-width:2px
graph LR
subgraph "After: 3 Servers (all keys remapped)"
K1b["user:101"] --> S0b["Server 0"]
K2b["user:102"] --> S2b["Server 2"]
K3b["user:103"] --> S0b
K4b["user:104"] --> S0b
K5b["user:105"] --> S0b
end
style S0b fill:#dcfce7,stroke:#16a34a,stroke-width:2px
style S2b fill:#dcfce7,stroke:#16a34a,stroke-width:2px
This is unacceptable for production systems. Losing one server should not mean losing your entire cache.
What Happens When You Add a Server
Same problem in reverse. You add a 5th server to handle more load. The formula changes to hash(key) % 5. Again, almost every key maps to a different server. Your cache just got wiped out during the exact moment you needed more capacity.
The root cause is simple: the modulo operation depends on the number of servers. Change the number, and everything shifts.
How Consistent Hashing Solves This
Consistent hashing gets rid of the modulo operation entirely. Instead of hash(key) % N, it places both servers and keys on a circular ring.
The Hash Ring
Imagine a circle (ring) with positions from 0 to 2^32 - 1 (about 4.3 billion positions). This is the hash ring.
- Hash each server to get its position on the ring
- Hash each key to get its position on the ring
- Walk clockwise from the key’s position until you hit a server. That server owns the key.
graph TB
subgraph "Consistent Hashing Ring"
direction TB
R["0 / 2³²"]
A["Server A<br/>(position 1000)"]
B["Server B<br/>(position 4500)"]
C["Server C<br/>(position 7200)"]
K1["key1<br/>(pos 800)"]
K2["key2<br/>(pos 2300)"]
K3["key3<br/>(pos 5100)"]
K4["key4<br/>(pos 9500)"]
K1 -->|clockwise| A
K2 -->|clockwise| B
K3 -->|clockwise| C
K4 -->|clockwise| A
end
style A fill:#dbeafe,stroke:#3b82f6,stroke-width:2px
style B fill:#dcfce7,stroke:#16a34a,stroke-width:2px
style C fill:#fef3c7,stroke:#f59e0b,stroke-width:2px
style K1 fill:#f3f4f6,stroke:#6b7280,stroke-width:1px
style K2 fill:#f3f4f6,stroke:#6b7280,stroke-width:1px
style K3 fill:#f3f4f6,stroke:#6b7280,stroke-width:1px
style K4 fill:#f3f4f6,stroke:#6b7280,stroke-width:1px
In this setup:
- key1 (position 800) walks clockwise and hits Server A (position 1000)
- key2 (position 2300) walks clockwise and hits Server B (position 4500)
- key3 (position 5100) walks clockwise and hits Server C (position 7200)
- key4 (position 9500) walks clockwise, wraps around, and hits Server A (position 1000)
What Happens Now When a Server Leaves
Let’s say Server B goes down. Only the keys that were assigned to Server B need to move. They walk clockwise and land on Server C instead.
graph TB
subgraph "Server B Removed"
direction TB
A2["Server A<br/>(position 1000)"]
C2["Server C<br/>(position 7200)"]
K1b["key1<br/>(pos 800)"]
K2b["key2 ← moved<br/>(pos 2300)"]
K3b["key3<br/>(pos 5100)"]
K4b["key4<br/>(pos 9500)"]
K1b -->|clockwise| A2
K2b -->|clockwise| C2
K3b -->|clockwise| C2
K4b -->|clockwise| A2
end
style A2 fill:#dbeafe,stroke:#3b82f6,stroke-width:2px
style C2 fill:#fef3c7,stroke:#f59e0b,stroke-width:2px
style K2b fill:#fee2e2,stroke:#dc2626,stroke-width:1px
style K1b fill:#f3f4f6,stroke:#6b7280,stroke-width:1px
style K3b fill:#f3f4f6,stroke:#6b7280,stroke-width:1px
style K4b fill:#f3f4f6,stroke:#6b7280,stroke-width:1px
key1 still goes to Server A. key3 still goes to Server C. key4 still goes to Server A. Only key2 moved from Server B to Server C.
That’s the magic. Instead of every key remapping, only the keys between the failed server and its predecessor need to move. Everything else stays put.
What Happens When a Server Joins
When a new Server D joins at position 6000 (between Server B and Server C), it takes over only the keys between its predecessor (Server B at 4500) and itself (6000).
sequenceDiagram
participant Ring as Hash Ring
participant D as Server D (new, pos 6000)
participant C as Server C (pos 7200)
Note over Ring: Server D joins at position 6000
Ring->>C: Which keys are between pos 4500 and 6000?
C-->>D: Transfer key3 (pos 5100)
Note over D: key3 now belongs to Server D
Note over C: All other keys unchanged
key3 (position 5100) used to go to Server C. Now it goes to Server D. Every other key stays where it was.
The Math
With traditional hashing and N servers, removing one server moves roughly (N-1)/N of all keys. For 10 servers, that’s 90% of your keys.
With consistent hashing, removing one server moves only 1/N of all keys. For 10 servers, that’s 10%.
| Servers | Traditional: keys moved on failure | Consistent: keys moved on failure |
|---|---|---|
| 5 | 80% | 20% |
| 10 | 90% | 10% |
| 50 | 98% | 2% |
| 100 | 99% | 1% |
The bigger your cluster, the better consistent hashing gets.
The Problem With Basic Consistent Hashing: Uneven Distribution
The basic version of consistent hashing has a flaw. With only a few servers, they might cluster together on the ring, leaving large gaps.
Imagine 3 servers that happen to hash close together:
graph TB
subgraph "Uneven Distribution"
direction TB
SA["Server A (pos 100)"]
SB["Server B (pos 200)"]
SC["Server C (pos 300)"]
Range1["A handles: 300 → 100<br/>99.97% of the ring"]
Range2["B handles: 100 → 200<br/>0.01% of the ring"]
Range3["C handles: 200 → 300<br/>0.01% of the ring"]
end
style SA fill:#fee2e2,stroke:#dc2626,stroke-width:2px
style SB fill:#dbeafe,stroke:#3b82f6,stroke-width:2px
style SC fill:#dcfce7,stroke:#16a34a,stroke-width:2px
style Range1 fill:#fee2e2,stroke:#dc2626,stroke-width:1px
style Range2 fill:#dbeafe,stroke:#3b82f6,stroke-width:1px
style Range3 fill:#dcfce7,stroke:#16a34a,stroke-width:1px
Server A handles almost all the keys. Server B and C are practically idle. This creates a hotspot on Server A and wastes the capacity of B and C.
Even with a good hash function, when you only have 3 to 5 servers, the distribution is unlikely to be perfectly even. Statistics only work in your favor with large numbers, and you don’t have large numbers of servers.
Virtual Nodes: The Fix
Virtual nodes (vnodes) solve the uneven distribution problem. Instead of placing each server at one position on the ring, you place it at many positions.
Each physical server gets multiple virtual nodes spread across the ring. Server A might get positions at 500, 3200, 7800, and 9100. Server B might get positions at 1200, 4500, 6700, and 8900.
graph TB
subgraph "Virtual Nodes on the Ring"
direction TB
VA1["A-vn1<br/>(pos 500)"]
VB1["B-vn1<br/>(pos 1200)"]
VA2["A-vn2<br/>(pos 3200)"]
VB2["B-vn2<br/>(pos 4500)"]
VC1["C-vn1<br/>(pos 5800)"]
VB3["B-vn3<br/>(pos 6700)"]
VA3["A-vn3<br/>(pos 7800)"]
VC2["C-vn2<br/>(pos 8500)"]
VB4["B-vn4<br/>(pos 8900)"]
VA4["A-vn4<br/>(pos 9100)"]
end
style VA1 fill:#dbeafe,stroke:#3b82f6,stroke-width:2px
style VA2 fill:#dbeafe,stroke:#3b82f6,stroke-width:2px
style VA3 fill:#dbeafe,stroke:#3b82f6,stroke-width:2px
style VA4 fill:#dbeafe,stroke:#3b82f6,stroke-width:2px
style VB1 fill:#dcfce7,stroke:#16a34a,stroke-width:2px
style VB2 fill:#dcfce7,stroke:#16a34a,stroke-width:2px
style VB3 fill:#dcfce7,stroke:#16a34a,stroke-width:2px
style VB4 fill:#dcfce7,stroke:#16a34a,stroke-width:2px
style VC1 fill:#fef3c7,stroke:#f59e0b,stroke-width:2px
style VC2 fill:#fef3c7,stroke:#f59e0b,stroke-width:2px
With 4 virtual nodes per server, the ring has 12 positions instead of 3. The gaps are much smaller and more uniform. More virtual nodes means better distribution.
How Many Virtual Nodes?
| Virtual nodes per server | Load distribution | Memory overhead |
|---|---|---|
| 1 | Very uneven | Minimal |
| 10 | Somewhat uneven | Low |
| 100 | Mostly even | Moderate |
| 150 to 200 | Even | Moderate |
| 256 (Cassandra default) | Very even | Moderate |
| 500+ | Near perfect | Higher |
The sweet spot is typically 100 to 200 virtual nodes per physical server. Cassandra uses 256 by default. Going beyond 500 gives diminishing returns and uses more memory for the ring lookup table.
Virtual Nodes Also Help With Heterogeneous Servers
Not all servers have the same capacity. A machine with 64 GB RAM should handle more data than one with 16 GB. With virtual nodes, you just give the bigger server more vnodes.
- Server A (64 GB): 200 virtual nodes
- Server B (32 GB): 100 virtual nodes
- Server C (16 GB): 50 virtual nodes
Server A gets roughly 4 times the traffic of Server C. The data distribution matches the actual server capacity. No manual sharding rules needed.
Implementation: How It Works in Code
The core data structure is surprisingly simple. You need a sorted map where keys are ring positions and values are server identifiers.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import java.util.*;
import java.security.MessageDigest;
public class ConsistentHashRing<T> {
private final TreeMap<Long, T> ring = new TreeMap<>();
private final int virtualNodes;
public ConsistentHashRing(int virtualNodes) {
this.virtualNodes = virtualNodes;
}
public void addServer(T server) {
for (int i = 0; i < virtualNodes; i++) {
long hash = hash(server.toString() + "-vn" + i);
ring.put(hash, server);
}
}
public void removeServer(T server) {
for (int i = 0; i < virtualNodes; i++) {
long hash = hash(server.toString() + "-vn" + i);
ring.remove(hash);
}
}
public T getServer(String key) {
if (ring.isEmpty()) return null;
long hash = hash(key);
// Find the first server position >= key's hash
Map.Entry<Long, T> entry = ring.ceilingEntry(hash);
// If no server found clockwise, wrap around to the first server
if (entry == null) {
entry = ring.firstEntry();
}
return entry.getValue();
}
private long hash(String key) {
try {
MessageDigest md = MessageDigest.getInstance("MD5");
byte[] digest = md.digest(key.getBytes());
return ((long)(digest[0] & 0xFF) << 24) |
((long)(digest[1] & 0xFF) << 16) |
((long)(digest[2] & 0xFF) << 8) |
((long)(digest[3] & 0xFF));
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
Usage:
1
2
3
4
5
6
7
8
9
10
11
12
ConsistentHashRing<String> ring = new ConsistentHashRing<>(150);
ring.addServer("cache-server-1");
ring.addServer("cache-server-2");
ring.addServer("cache-server-3");
// Route keys to servers
String server = ring.getServer("user:12345");
System.out.println("user:12345 → " + server);
// Add a new server. Most keys stay on the same server.
ring.addServer("cache-server-4");
The key operations:
- addServer: O(V * log(N * V)) where V is virtual nodes and N is total servers
- removeServer: O(V * log(N * V))
- getServer (lookup): O(log(N * V))
The lookup is a binary search on a sorted tree, so it’s fast even with thousands of servers and hundreds of virtual nodes per server.
Replication with Consistent Hashing
In production distributed systems, you don’t just store data on one server. You replicate it to handle failures. Consistent hashing makes replication straightforward.
To replicate a key with a replication factor of 3, store it on the server found by walking clockwise, plus the next 2 distinct servers clockwise.
sequenceDiagram
participant C as Client
participant S1 as Server A (primary)
participant S2 as Server B (replica 1)
participant S3 as Server C (replica 2)
Note over C: Write key "user:500"<br/>Hash lands near Server A
C->>S1: Write "user:500" (primary)
S1->>S2: Replicate to next server clockwise
S2->>S3: Replicate to next server clockwise
Note over S1,S3: Key stored on 3 servers
Note over C: If Server A fails...
C->>S2: Read "user:500" (fallback to replica)
S2-->>C: Return data
This is exactly how DynamoDB and Cassandra handle replication. Walk clockwise on the ring, skip virtual nodes that belong to the same physical server, and replicate to the next N distinct physical servers.
When combined with gossip protocol for failure detection, the system can automatically route reads to replicas when the primary server is down.
Real World: Who Uses Consistent Hashing
Amazon DynamoDB
DynamoDB is probably the most famous user of consistent hashing. The original Dynamo paper from 2007 described how Amazon uses consistent hashing to partition data across storage nodes.
Each DynamoDB partition is assigned a range on the hash ring. When you write an item, DynamoDB hashes its partition key to find which partition (and which storage node) owns it. When nodes join or leave, only the affected range of the ring gets rebalanced.
DynamoDB uses virtual nodes and a preference list (ordered list of nodes responsible for a key range) to handle replication and failure recovery.
Apache Cassandra
Cassandra uses consistent hashing as its core data distribution mechanism. Every row in Cassandra has a partition key, and the partition key is hashed (using Murmur3 by default) to determine which node stores it.
Cassandra uses 256 virtual nodes per server by default. This gives a fairly even distribution even with clusters of just 3 to 5 nodes. The token range each node owns is visible and can be manually adjusted.
When a new node joins, it takes over token ranges from existing nodes. Gossip protocol propagates the new token assignments to the rest of the cluster.
Memcached
The Memcached client library uses consistent hashing to decide which cache server stores each key. When you call memcached.set("user:123", data), the client hashes “user:123” and uses the hash ring to find the target server.
This is a client-side implementation of consistent hashing. The servers don’t know about each other. The client maintains the ring and routes requests accordingly.
The big win: when a Memcached server goes down, only the keys on that server are lost. All other servers keep their data intact. Without consistent hashing, losing one server would invalidate almost every cache entry.
CDNs (Akamai, Cloudflare)
Content Delivery Networks use consistent hashing to route content requests to edge servers. When a user requests a file, the CDN hashes the URL to determine which edge server should cache that content.
This ensures that the same content is consistently served from the same edge server, maximizing cache hit rates. When edge servers are added or removed (for maintenance or scaling), only a fraction of content needs to be re-fetched from the origin.
The original consistent hashing paper by Karger et al. (1997) was actually motivated by this exact CDN use case. David Karger and his colleagues at MIT developed consistent hashing specifically to solve the web caching problem for Akamai.
Discord
Discord uses consistent hashing to route messages to the correct server in their chat infrastructure. Each guild (server in Discord terms) is mapped to a specific backend server using consistent hashing. This ensures all messages for a guild go to the same server, which maintains the in-memory state for that guild.
Consistent Hashing in System Design Interviews
Consistent hashing shows up in a lot of system design interview questions. Here are the common scenarios where you should bring it up:
Distributed Cache Design
When designing a distributed cache, consistent hashing is the go-to approach for key distribution. The interviewer wants to hear that you understand why hash(key) % N breaks during scaling and how consistent hashing fixes it.
Key points to mention:
- Hash ring with virtual nodes for even distribution
- Only K/N keys move when a server is added or removed
- Client-side vs server-side ring management
- Cache hit rate stays high during scaling events
Database Sharding
When sharding a database across multiple servers, consistent hashing determines which shard holds which data. This is preferable to range-based sharding when you want even distribution and the ability to add shards without rebalancing everything.
Load Balancer Design
For sticky sessions (routing the same user to the same backend server), consistent hashing on the user ID or session ID gives you persistence without a session store. If a backend server goes down, only the sessions on that server are affected.
URL Shortener / Pastebin
When designing systems like URL shorteners, you need to distribute data across multiple storage nodes. Consistent hashing on the short URL key distributes writes evenly and makes it easy to add more storage servers as the system grows.
Consistent Hashing vs Other Partitioning Strategies
| Strategy | Key Movement on Scaling | Distribution | Complexity | Use Case |
|---|---|---|---|---|
| Modulo hashing | Nearly all keys | Even (when stable) | Low | Small, fixed clusters |
| Range-based | Only affected range | Can be uneven | Medium | Ordered data (time series) |
| Consistent hashing | Only K/N keys | Even (with vnodes) | Medium | Dynamic clusters, caches |
| Rendezvous hashing | Only K/N keys | Even | Medium | When ring overhead matters |
| Directory-based | Depends on rules | Configurable | High | Complex routing needs |
Consistent hashing is the best fit when your cluster size changes frequently and you want to minimize data movement. Range-based partitioning is better when you need ordered scans across a range of keys.
Common Pitfalls
1. Using Too Few Virtual Nodes
With only 1 virtual node per server, distribution is poor. One server might handle 60% of keys while another handles 5%. Use at least 100 virtual nodes per server.
2. Ignoring the Cost of Data Migration
Consistent hashing minimizes which keys move, but those keys still need to be physically transferred. If Server B fails and its 1 million keys need to move to Server C, that migration takes time and bandwidth. Plan for it.
Fix: Use replication. If every key is stored on 3 servers, you don’t need to transfer anything when one fails. The replicas already have the data.
3. Hot Keys
Consistent hashing distributes keys evenly, but it can’t fix hotspot keys. If one key gets 10,000 times more reads than others (think a viral tweet), the server holding that key gets overloaded regardless of how evenly the ring distributes other keys.
Fix: Use read replicas for hot keys, or split hot keys into sub-keys (e.g., popular_post:1:shard_0 through popular_post:1:shard_9) and merge results on read.
4. Ring Metadata Consistency
Every client or node needs the same view of the ring. If Client A thinks the ring has 5 servers and Client B thinks it has 4, they will route the same key to different servers.
Fix: Use a coordination service like ZooKeeper or etcd to store the ring configuration. Or use gossip protocol to propagate ring changes, as Cassandra does.
5. Not Accounting for Server Heterogeneity
If all servers get the same number of virtual nodes but have different capacities, smaller servers get overwhelmed.
Fix: Assign virtual nodes proportional to server capacity. A server with 2x the RAM gets 2x the virtual nodes.
When to Use Consistent Hashing
Use consistent hashing when:
- Your cluster scales up and down (servers are added or removed)
- You want to minimize data movement during scaling
- You need to distribute data or load across multiple servers
- You are building a distributed cache, database, or CDN
- You need sticky routing (same key always goes to the same server)
Don’t use consistent hashing when:
- You have a fixed, small number of servers that never change (simple modulo is fine)
- You need range queries on keys (use range-based partitioning)
- Your data has a natural partition key that maps to specific servers (use directory-based routing)
- You have a single database server (no partitioning needed)
Wrapping Up
Consistent hashing is one of those ideas that seems simple once you understand it, but it solves a real and painful problem in distributed systems. The key takeaways:
- Traditional hashing breaks when cluster size changes:
hash(key) % Nremaps almost every key - Consistent hashing uses a ring: Both servers and keys are hashed onto a circle, and keys go to the nearest server clockwise
- Only K/N keys move during scaling: Adding or removing a server affects only a fraction of keys
- Virtual nodes fix uneven distribution: Place each server at multiple ring positions for balanced load
- Replication walks the ring: Store copies on the next N distinct servers clockwise for fault tolerance
- It’s everywhere: DynamoDB, Cassandra, Memcached, CDNs, Discord, and more
The next time you design a system that needs to distribute data across multiple servers, start with consistent hashing. It’s battle tested, well understood, and it will save you from cache stampedes and data reshuffling nightmares.
For more on distributed systems, check out Gossip Protocol for how nodes discover each other, Caching Strategies for cache patterns, and How Kafka Works for distributed messaging.
Building systems that scale? See System Design Cheat Sheet for a complete reference, How Meta Achieves Cache Consistency for distributed caching at scale, and Hash Collisions Explained for the data structure fundamentals behind hashing.
References: Consistent Hashing and Random Trees (Karger et al., 1997), Dynamo: Amazon’s Highly Available Key-Value Store, Apache Cassandra Architecture Documentation, Toptal: A Guide to Consistent Hashing