Every tweet you post, every Discord message you send, every Instagram photo you upload gets a unique ID. At Twitter’s scale, that’s over 400 million tweets per day. Each needs an ID that’s guaranteed unique across thousands of servers. They can’t just ask a central database “what’s the next number?” because that database would melt under the load.
So in 2010, Twitter’s engineering team built Snowflake. It’s a system that lets any server generate unique IDs on its own, with no coordination, no locking, no single point of failure. And the IDs are sortable by time.
Let’s break down how this works and how you can use it.
The Problem With Traditional ID Generation
Before we get into Snowflake, let’s understand why generating IDs at scale is hard.
Option 1: Auto-increment in your database
Simple and works great for small apps. But at scale:
- Your database becomes a bottleneck
- Every insert requires a round trip to get the next ID
- If that database goes down, you can’t create anything
- Sharding becomes a nightmare
Option 2: Random UUIDs
Generate a 128-bit random number. Problem solved, right? Not quite:
- UUIDs are big (36 characters as strings, 16 bytes binary)
- They’re not sortable by time
- Random distribution kills B-tree index performance
- Your database does more page splits and uses more storage
Option 3: Snowflake IDs
The best of both worlds. Unique like UUIDs, but smaller and time-sortable.
Snowflake ID Structure: The 64-Bit Breakdown
A Snowflake ID packs three pieces of information into 64 bits:
Here’s what each part does:
| Component | Bits | Purpose | Capacity |
|---|---|---|---|
| Sign bit | 1 | Always 0 (keeps ID positive) | - |
| Timestamp | 41 | Milliseconds since custom epoch | ~69 years |
| Machine ID | 10 | Which server generated this | 1024 machines |
| Sequence | 12 | Counter for same-millisecond IDs | 4096 per ms |
The math works out nicely:
- 1024 machines x 4096 IDs per millisecond = 4+ million IDs per millisecond across your cluster
- That’s over 4 billion IDs per second if you need them
How Snowflake ID Generation Works
When a server needs to generate an ID, it follows this process:
sequenceDiagram
participant App as Application
participant Gen as ID Generator
participant Clock as System Clock
App->>Gen: Generate new ID
Gen->>Clock: Get current time
Clock-->>Gen: 1705234567890 ms
Gen->>Gen: Compare with last timestamp
alt Same millisecond as last ID
Gen->>Gen: Increment sequence (0, 1, 2...)
Note over Gen: If sequence hits 4096,
wait for next millisecond
else New millisecond
Gen->>Gen: Reset sequence to 0
end
Gen->>Gen: Combine bits:
timestamp | machine_id | sequence
Gen-->>App: Return 7155791234567890123
The key insight is that no coordination is needed between servers. Each server:
- Knows its own machine ID (assigned at startup)
- Can read its own clock
- Tracks its own sequence number
As long as machine IDs are unique across your cluster, the generated IDs are guaranteed unique.
Twitter vs Discord: Different Epochs, Same Idea
Both Twitter and Discord use Snowflake IDs, but they chose different starting points (epochs):
| Platform | Epoch | Epoch Date |
|---|---|---|
| 1288834974657 | Nov 4, 2010 | |
| Discord | 1420070400000 | Jan 1, 2015 |
Why does the epoch matter?
The 41-bit timestamp stores milliseconds since the epoch. Starting from a later date means:
- Smaller initial IDs (fits in fewer digits)
- More years before the timestamp bits run out
Twitter’s epoch is their launch date. Discord picked the start of 2015 when they were building their system.
Discord Snowflake ID length in digits:
Discord Snowflake IDs typically range from 17 to 19 digits when displayed as decimal numbers. For example:
- An early Discord ID:
81384788765712384(17 digits) - A recent Discord ID:
1234567890123456789(19 digits)
The exact length depends on when the ID was created. Older accounts have shorter IDs.
Extracting the Timestamp: Decoding Snowflake IDs
One of the most useful features of Snowflake IDs is that you can extract the creation time just by looking at the ID. No database lookup required.
For Discord IDs (JavaScript):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
function getDiscordTimestamp(snowflakeId) {
// Discord epoch: January 1, 2015
const DISCORD_EPOCH = 1420070400000n;
// Convert to BigInt if needed (IDs can exceed JS number precision)
const id = BigInt(snowflakeId);
// Right-shift by 22 bits to get timestamp
const timestamp = (id >> 22n) + DISCORD_EPOCH;
return new Date(Number(timestamp));
}
// Example: Get when a Discord user account was created
const userId = "123456789012345678";
console.log(getDiscordTimestamp(userId));
// Output: 2017-03-15T12:34:56.789Z
For Twitter IDs (Python):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
def get_twitter_timestamp(snowflake_id):
# Twitter epoch: November 4, 2010
TWITTER_EPOCH = 1288834974657
# Right-shift by 22 bits to get timestamp offset
timestamp_ms = (snowflake_id >> 22) + TWITTER_EPOCH
from datetime import datetime
return datetime.fromtimestamp(timestamp_ms / 1000)
# Example: Get when a tweet was posted
tweet_id = 1541815603606036480
print(get_twitter_timestamp(tweet_id))
# Output: 2022-06-28 15:23:45.123000
This is incredibly useful for:
- Sorting content by creation time without a separate timestamp column
- Debugging (when was this record created?)
- Time-based sharding and partitioning
- Rate limiting by creation time
Snowflake ID Generation in Java
For production Java applications, use a battle-tested library. Here’s a simple implementation to understand the concepts:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
public class SnowflakeIdGenerator {
private static final long EPOCH = 1609459200000L; // Jan 1, 2021
private static final long MACHINE_ID_BITS = 10L;
private static final long SEQUENCE_BITS = 12L;
private static final long MAX_MACHINE_ID = ~(-1L << MACHINE_ID_BITS);
private static final long MAX_SEQUENCE = ~(-1L << SEQUENCE_BITS);
private static final long MACHINE_ID_SHIFT = SEQUENCE_BITS;
private static final long TIMESTAMP_SHIFT = SEQUENCE_BITS + MACHINE_ID_BITS;
private final long machineId;
private long lastTimestamp = -1L;
private long sequence = 0L;
public SnowflakeIdGenerator(long machineId) {
if (machineId < 0 || machineId > MAX_MACHINE_ID) {
throw new IllegalArgumentException(
"Machine ID must be between 0 and " + MAX_MACHINE_ID);
}
this.machineId = machineId;
}
public synchronized long nextId() {
long currentTimestamp = System.currentTimeMillis();
if (currentTimestamp < lastTimestamp) {
throw new RuntimeException("Clock moved backwards!");
}
if (currentTimestamp == lastTimestamp) {
sequence = (sequence + 1) & MAX_SEQUENCE;
if (sequence == 0) {
// Sequence exhausted, wait for next millisecond
currentTimestamp = waitNextMillis(lastTimestamp);
}
} else {
sequence = 0L;
}
lastTimestamp = currentTimestamp;
return ((currentTimestamp - EPOCH) << TIMESTAMP_SHIFT)
| (machineId << MACHINE_ID_SHIFT)
| sequence;
}
private long waitNextMillis(long lastTimestamp) {
long timestamp = System.currentTimeMillis();
while (timestamp <= lastTimestamp) {
timestamp = System.currentTimeMillis();
}
return timestamp;
}
}
Using it:
1
2
3
4
5
6
7
8
9
// Each server gets a unique machine ID (0-1023)
SnowflakeIdGenerator generator = new SnowflakeIdGenerator(42);
// Generate IDs
long orderId = generator.nextId();
long userId = generator.nextId();
System.out.println("Order ID: " + orderId); // 7155791234567890123
System.out.println("User ID: " + userId); // 7155791234567890124
Production-ready libraries for Java:
- callicoder/java-snowflake - Clean implementation with good documentation
- phxql/snowflake-id - Customizable bit allocation
- HMWCS/java-snowflake-id-generator - Optimized with CAS for high throughput
Snowflake ID vs UUID: When to Use Which
Here’s a direct comparison:
| Feature | Snowflake ID | UUID v4 |
|---|---|---|
| Size | 64 bits (8 bytes) | 128 bits (16 bytes) |
| As string | 19 digits max | 36 characters |
| Time-sortable | Yes | No |
| Index performance | Excellent | Poor |
| Coordination needed | Machine ID only | None |
| Collision risk | Zero (with proper setup) | Near zero |
| Extract creation time | Yes | No |
| JavaScript safe | Need BigInt | String representation |
Choose Snowflake IDs when:
- You need IDs sorted by creation time
- Database index performance matters
- You’re building a distributed system with known nodes
- Storage size is a concern
- You want to extract timestamps from IDs
Choose UUIDs when:
- You need true decentralization (no machine ID coordination)
- Working with systems that expect UUIDs
- Simpler setup is more important than performance
- You’re using UUID-native databases like PostgreSQL
The Clock Skew Problem
Snowflake IDs depend on accurate system clocks. What happens if a server’s clock goes backward?
The problem:
1
2
3
4
5
6
7
Server time: 1705234567890
Generate ID with timestamp: 1705234567890
Clock drifts backward (NTP sync, manual change, etc.)
Server time: 1705234567880
Generate ID with timestamp: 1705234567880 <- This could duplicate an old ID!
How to handle it:
-
Refuse to generate - Most implementations throw an error if the clock goes backward. Your monitoring should catch this.
-
Wait it out - If the clock jumped back by a small amount (< 5ms), just wait until the clock catches up.
-
Use logical clocks - Track the last timestamp and never go backward, even if the system clock does.
Here’s a defensive implementation:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
public synchronized long nextId() {
long currentTimestamp = System.currentTimeMillis();
if (currentTimestamp < lastTimestamp) {
long offset = lastTimestamp - currentTimestamp;
if (offset < 5) {
// Small drift, wait it out
try {
Thread.sleep(offset);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
currentTimestamp = System.currentTimeMillis();
} else {
// Large drift, refuse to generate
throw new ClockMovedBackException(
"Clock moved backward by " + offset + "ms");
}
}
// Continue with normal generation...
}
Best practices:
- Use NTP with slew mode (gradual adjustment) instead of step mode
- Monitor clock drift across your servers
- Set up alerts for clock skew > 1 second
- Consider using a distributed time service like Spanner’s TrueTime
Machine ID Assignment in the Cloud
In containerized environments, assigning unique machine IDs gets tricky. Containers come and go. Auto-scaling adds new instances.
Option 1: Use a coordination service
Store machine ID assignments in ZooKeeper, etcd, or Consul:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
public class ZookeeperMachineId {
private static final String MACHINE_ID_PATH = "/snowflake/machine-ids";
public int acquireMachineId(CuratorFramework client) throws Exception {
// Create ephemeral sequential node
String path = client.create()
.withMode(CreateMode.EPHEMERAL_SEQUENTIAL)
.forPath(MACHINE_ID_PATH + "/machine-");
// Extract sequence number from path
String sequenceStr = path.substring(path.lastIndexOf("-") + 1);
int machineId = Integer.parseInt(sequenceStr) % 1024;
return machineId;
}
}
When the container dies, the ephemeral node disappears, freeing the ID.
Option 2: Hash the container identity
Derive machine ID from something unique about the container:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
public int getMachineIdFromEnvironment() {
// Use pod name in Kubernetes
String podName = System.getenv("HOSTNAME");
if (podName != null) {
return Math.abs(podName.hashCode()) % 1024;
}
// Fallback: use IP address
try {
InetAddress addr = InetAddress.getLocalHost();
byte[] bytes = addr.getAddress();
return ((bytes[2] & 0xFF) << 2) | (bytes[3] & 0x03);
} catch (UnknownHostException e) {
throw new RuntimeException("Cannot determine machine ID", e);
}
}
Option 3: Use a central ID service
Run a dedicated service that generates Snowflake IDs and exposes them via API. The service handles all the complexity; your apps just call it.
graph TB
subgraph "Application Servers"
A1[App Server 1]
A2[App Server 2]
A3[App Server 3]
end
subgraph "ID Service Cluster"
ID1[ID Service
Machine 1]
ID2[ID Service
Machine 2]
ID3[ID Service
Machine 3]
end
A1 --> ID1
A2 --> ID2
A3 --> ID3
style ID1 fill:#dcfce7,stroke:#16a34a,stroke-width:2px
style ID2 fill:#dcfce7,stroke:#16a34a,stroke-width:2px
style ID3 fill:#dcfce7,stroke:#16a34a,stroke-width:2px
This adds a network hop but centralizes the machine ID management problem.
Real-World Implementations
Here’s how major companies use Snowflake-style IDs:
Twitter (Original Snowflake)
- 41 bits timestamp, 10 bits machine ID, 12 bits sequence
- Open-sourced in 2010, later deprecated their implementation
- Still used for tweet IDs
Discord
- Same structure as Twitter but different epoch (2015)
- Used for everything: users, servers, channels, messages
- Exposes timestamp extraction in their API docs
- Modified structure: 41 bits timestamp, 13 bits shard ID, 10 bits sequence
- Shard ID helps with database partitioning
- Implemented as PostgreSQL stored procedures
Sony (Sonyflake)
- Optimized for fewer machines, more sequence bits
- 39 bits timestamp, 8 bits sequence, 16 bits machine ID
- Useful when you have more machines than sequence demand
Database Performance Impact
Using Snowflake IDs as primary keys gives you significant performance benefits.
Why random UUIDs hurt performance:
graph TB
subgraph "UUID v4 Insert Pattern"
U1["Insert UUID abc..."] --> P1["Page 42"]
U2["Insert UUID xyz..."] --> P2["Page 891"]
U3["Insert UUID def..."] --> P3["Page 7"]
U4["Insert UUID mno..."] --> P4["Page 523"]
end
subgraph "Snowflake Insert Pattern"
S1["Insert ID 7001"] --> Q1["Page 100"]
S2["Insert ID 7002"] --> Q1
S3["Insert ID 7003"] --> Q1
S4["Insert ID 7004"] --> Q1
end
style P1 fill:#fee2e2,stroke:#dc2626
style P2 fill:#fee2e2,stroke:#dc2626
style P3 fill:#fee2e2,stroke:#dc2626
style P4 fill:#fee2e2,stroke:#dc2626
style Q1 fill:#dcfce7,stroke:#16a34a
Random UUIDs scatter inserts across the entire index, causing:
- More disk I/O (different pages for each insert)
- Poor cache utilization
- Frequent page splits
- Index fragmentation
Snowflake IDs insert sequentially (roughly), keeping recent data together:
- Writes hit the same pages repeatedly
- Better cache hit rates
- Fewer page splits
- Natural time-based partitioning
Benchmark results (your mileage may vary):
| Metric | UUID v4 | Snowflake ID |
|---|---|---|
| Insert throughput | ~15,000/sec | ~45,000/sec |
| Index size | 100% | ~65% |
| Point lookup | ~2ms | ~1ms |
| Range query (recent) | ~15ms | ~3ms |
Key Takeaways
1. Snowflake IDs solve the distributed ID problem elegantly. No coordination needed between servers, no single point of failure, and IDs are guaranteed unique.
2. The 64-bit structure is carefully designed. 41 bits for timestamp (~69 years), 10 bits for machine ID (1024 nodes), 12 bits for sequence (4096 IDs per millisecond per node).
3. Time-sortability is a major advantage. Sort by ID to sort by creation time. Extract timestamps without database lookups. Enable efficient range queries on recent data.
4. Discord Snowflake IDs are 17-19 digits. The exact length depends on when the ID was created. Use BigInt in JavaScript to handle them safely.
5. Clock synchronization matters. Use NTP, monitor for drift, and handle backward clock jumps gracefully.
6. Machine ID assignment needs thought in containers. Use coordination services, hash container identity, or run a central ID service.
7. Database performance improves significantly. Sequential inserts, better cache utilization, smaller indexes compared to UUIDs.
Snowflake IDs aren’t the answer for every system. But when you’re building something that needs to scale, handle billions of records, and maintain performance, they’re worth understanding.
Want to learn more about distributed systems? Check out How Stripe Prevents Double Payment for idempotency patterns, Long Polling Explained for real-time communication, and Kubernetes Architecture Explained for container orchestration at scale.
References: Twitter Snowflake (GitHub Archive), Discord Developer Docs, Instagram Engineering Blog