When you like a friend’s photo on Facebook, that like appears instantly across all their devices, to all their friends, and in all the places that photo appears on the platform. Behind this simple action lies one of the most sophisticated distributed cache consistency systems ever built.
Meta serves 3+ billion users with 99.99999999% cache consistency - that’s less than one inconsistent read per billion operations. They process trillions of cache operations daily across hundreds of data centers, while keeping data perfectly synchronized. This isn’t just about speed - it’s about making sure 3 billion people see the same version of reality.
The core challenges they solved:
- Perfect consistency across global data centers
- Sub-millisecond latency for cache reads
- Instant invalidation when data changes
- Fault tolerance during network partitions
- Linear scaling to billions of operations
Let’s see how they built a system that never shows stale data.
Meta’s Cache Architecture: The 30,000-Foot View
Before diving into the consistency mechanisms, here’s how Meta’s cache system looks from above:
graph TB
subgraph "📱 Client Apps"
Clients[Web, Mobile, Internal Apps]
end
subgraph "🖥️ Application Servers"
Apps[Web Servers & API Servers]
end
subgraph "⚡ Cache Layer"
direction LR
TAO[TAO
Graph Cache
Social data]
Memcache[Memcache
Key-Value Cache
Profile data]
end
subgraph "🗃️ Storage Layer"
direction LR
MySQL[(MySQL
Shards)]
GraphDB[(Graph
Database)]
end
subgraph "🔄 Consistency System"
Invalidator[Cache Invalidation Service
Ensures cache consistency]
end
%% Main request flow
Clients --> Apps
Apps --> TAO
Apps --> Memcache
%% Cache to storage
TAO --> GraphDB
Memcache --> MySQL
%% Consistency flow (simplified)
GraphDB --> Invalidator
MySQL --> Invalidator
Invalidator --> TAO
Invalidator --> Memcache
Two Key Data Flows:
graph LR
subgraph "🔥 Hot Path: Write-Through (Strong Consistency)"
direction TB
App1[App Write Request]
App1 --> TAO1[TAO Cache]
TAO1 -->|1. Write to DB first| GraphDB1[(Graph DB)]
GraphDB1 -->|2. Then invalidate| TAO1
TAO1 -->|3. Response| App1
style TAO1 fill:#ffebee
style GraphDB1 fill:#e8f5e8
end
subgraph "❄️ Cold Path: Write-Behind (Eventual Consistency)"
direction TB
App2[App Write Request]
App2 --> MySQL1[(MySQL)]
MySQL1 -->|1. DB write triggers| Inval2[Async Invalidator]
Inval2 -->|2. Invalidate later| Memcache1[Memcache]
App2 -->|3. Immediate response| Response2[Response]
style Memcache1 fill:#fff3e0
style MySQL1 fill:#e8f5e8
end
The Two-Layer Cache Strategy
Meta’s key insight: different data needs different consistency guarantees.
TAO: The Graph Cache (Strong Consistency)
- Social graph data (friends, likes, comments)
- Immediate consistency required
- Write-through caching with instant invalidation
- Handles complex relationships and queries
Memcache: The Data Cache (Eventual Consistency)
- Profile data, posts, media metadata
- Can tolerate brief inconsistency
- Write-behind caching with delayed invalidation
- Optimized for raw throughput
Why split them? Social interactions must be instantly consistent (you can’t “unlike” something that never showed as liked), but profile updates can have eventual consistency.
Here’s how a like flows through both systems:
sequenceDiagram
participant User as 👤 User
participant Web as 🖥️ Web Server
participant TAO as ⚡ TAO Cache
participant DB as 🗃️ Database
participant Invalidator as 🚫 Invalidator
participant Friends as 👥 Friend Feeds
User->>Web: Like photo
Web->>TAO: Check permissions
TAO->>DB: Read graph data
Web->>DB: Write like record
DB->>Invalidator: Trigger invalidation
Invalidator->>TAO: Invalidate like cache
Invalidator->>Friends: Invalidate feed caches
Web->>User: Confirm like
Note over TAO,Friends: All caches updated before response
The TAO Graph Cache: Where Consistency Matters Most
TAO (The Associations and Objects) is Meta’s secret weapon for perfect consistency. It’s a write-through cache that sits between applications and the graph database.
graph LR
subgraph "TAO Architecture"
subgraph "TAO Tier 1 (Follower)"
TAO1[TAO Follower 1]
TAO2[TAO Follower 2]
TAO3[TAO Follower 3]
end
subgraph "TAO Tier 2 (Leader)"
Leader[TAO Leader]
end
subgraph "Storage"
MySQL[(MySQL Shard)]
end
subgraph "Invalidation"
AsyncInval[Async Invalidation]
end
end
Apps[Applications] --> TAO1
Apps --> TAO2
Apps --> TAO3
TAO1 -.->|Cache Miss| Leader
TAO2 -.->|Cache Miss| Leader
TAO3 -.->|Cache Miss| Leader
Leader --> MySQL
MySQL --> AsyncInval
AsyncInval --> TAO1
AsyncInval --> TAO2
AsyncInval --> TAO3
TAO’s Consistency Guarantees
Read-after-write consistency: Your writes are immediately visible to your reads.
Monotonic read consistency: Once you see a value, you never see an older version.
Timeline consistency: Events appear in the same order across all views.
The Invalidation Pipeline
When data changes, TAO’s invalidation system ensures every cache gets updated:
- Write committed to MySQL
- MySQL trigger fires invalidation message
- Async invalidation service processes the message
- All TAO caches get invalidated simultaneously
- Next read repopulates with fresh data
This happens in under 10 milliseconds globally.
Memcache: Scaling Raw Performance
For non-social data, Meta uses a massive memcache deployment with eventual consistency.
graph TB
subgraph "Regional Cluster (US West)"
subgraph "Frontend Cluster"
WebUSW[Web Servers]
McRouterUSW[McRouter]
end
subgraph "Cache Pool"
Pool1USW[Pool 1: User Data]
Pool2USW[Pool 2: Content]
Pool3USW[Pool 3: Media]
end
end
subgraph "Regional Cluster (US East)"
subgraph "Frontend Cluster"
WebUSE[Web Servers]
McRouterUSE[McRouter]
end
subgraph "Cache Pool"
Pool1USE[Pool 1: User Data]
Pool2USE[Pool 2: Content]
Pool3USE[Pool 3: Media]
end
end
subgraph "Master Database"
MySQL[(MySQL Master)]
Invalidation[Invalidation Service]
end
WebUSW --> McRouterUSW
WebUSE --> McRouterUSE
McRouterUSW --> Pool1USW
McRouterUSW --> Pool2USW
McRouterUSW --> Pool3USW
McRouterUSE --> Pool1USE
McRouterUSE --> Pool2USE
McRouterUSE --> Pool3USE
MySQL --> Invalidation
Invalidation --> Pool1USW
Invalidation --> Pool1USE
Invalidation --> Pool2USW
Invalidation --> Pool2USE
Invalidation --> Pool3USW
Invalidation --> Pool3USE
McRouter: The Smart Proxy
McRouter sits between applications and memcache, providing:
- Consistent hashing for automatic sharding
- Failover handling when cache servers die
- Connection pooling for efficiency
- Invalidation routing to the right caches
The Gutter Pool Pattern
Problem: What happens when a cache server dies and you have a “hot key” that everyone’s trying to read?
Solution: The gutter pool - a separate set of cache servers that temporarily store data for failed servers.
graph LR
subgraph "Normal Operation"
App1[App Server] --> McRouter1[McRouter]
McRouter1 --> Cache1[Cache Server 1]
McRouter1 --> Cache2[Cache Server 2]
McRouter1 --> Cache3[Cache Server 3]
end
subgraph "Failure Scenario"
App2[App Server] --> McRouter2[McRouter]
McRouter2 --> CacheX[❌ Cache Server 2]
McRouter2 --> Cache1b[Cache Server 1]
McRouter2 --> Cache3b[Cache Server 3]
McRouter2 -.->|Hot Key Fallback| Gutter[Gutter Pool]
end
subgraph "Database"
DB[(Database)]
end
Cache1 -.->|Cache Miss| DB
Gutter -.->|Cache Miss| DB
When Cache Server 2 fails, hot keys that were stored there get temporarily stored in the Gutter Pool, preventing database overload.
Cross-Datacenter Consistency
Meta operates hundreds of data centers globally. How do they keep caches consistent across continents?
The Primary-Replica Invalidation Model
graph TB
subgraph "Primary Region (US)"
PrimaryDB[(Primary Database)]
PrimaryInval[Primary Invalidation]
PrimaryCache[Primary Cache]
end
subgraph "Replica Region (EU)"
ReplicaDB[(Replica Database)]
ReplicaInval[Replica Invalidation]
ReplicaCache[Replica Cache]
end
subgraph "Replica Region (Asia)"
ReplicaDB2[(Replica Database)]
ReplicaInval2[Replica Invalidation]
ReplicaCache2[Replica Cache]
end
subgraph "Global Invalidation Bus"
MessageQueue[Global Message Queue]
end
PrimaryDB --> PrimaryInval
PrimaryInval --> PrimaryCache
PrimaryInval --> MessageQueue
MessageQueue --> ReplicaInval
MessageQueue --> ReplicaInval2
ReplicaInval --> ReplicaCache
ReplicaInval2 --> ReplicaCache2
PrimaryDB -.->|Replication| ReplicaDB
PrimaryDB -.->|Replication| ReplicaDB2
The Invalidation Message Flow
- Write happens in primary region
- Local invalidation occurs immediately
- Global invalidation message sent to message queue
- Replica regions receive invalidation message
- Local caches in replica regions get invalidated
- Next read fetches from local database replica
Critical insight: Invalidations travel faster than data replication, ensuring cache misses hit the local database replica with fresh data.
Cache Consistency During Network Partitions
The problem: What happens when data centers can’t talk to each other?
Meta’s Consistency Hierarchy
During network partitions, Meta has a clear hierarchy:
- Prefer consistency over availability for social graph data
- Prefer availability over consistency for content data
- Always prefer user safety (better to show old data than wrong data)
graph TB
subgraph "Normal State"
App1[App] --> Cache1[Cache]
Cache1 --> DB1[(Database)]
end
subgraph "Partition State"
App2[App] --> Cache2[Cache]
Cache2 --> Nothing[❌ No DB Access]
Cache2 -.->|Stale Reads Only| StaleData[Stale Data]
end
subgraph "Recovery State"
App3[App] --> Cache3[Cache]
Cache3 --> DB3[(Database)]
DB3 --> FullInvalidation[Full Cache Invalidation]
end
Nothing -.->|Network Heals| DB3
The Lease System
To prevent thundering herds during failures, Meta uses a lease system:
- Cache miss occurs during partition
- Lease granted to one request per key
- Other requests wait for lease holder
- Lease holder fetches from database
- Result shared with all waiting requests
Monitoring Cache Consistency
Key insight: Monitor user experience, not just cache hit rates.
Meta tracks:
- Consistency violations per billion operations (target: <1)
- Invalidation propagation time (target: <10ms globally)
- Stale read detection rate (target: <0.001%)
- Cross-datacenter lag in invalidations (target: <50ms)
graph LR
subgraph "Monitoring Pipeline"
Logs[Cache Access Logs]
Detector[Consistency Detector]
Alerts[Real-time Alerts]
Dashboard[Consistency Dashboard]
end
subgraph "Detection Methods"
Timestamps[Timestamp Comparison]
Checksums[Data Checksums]
Heartbeats[Invalidation Heartbeats]
end
Logs --> Detector
Detector --> Timestamps
Detector --> Checksums
Detector --> Heartbeats
Detector --> Alerts
Detector --> Dashboard
Technology Stack
Caching: TAO (custom), Memcache (modified), McRouter (custom proxy)
Invalidation: Custom async invalidation service, Kafka-like message queues
Storage: MySQL (heavily modified), Custom graph databas
Networking: Custom network protocols optimized for invalidation latency
Monitoring: Custom consistency monitoring, Real-time alerting
The Bottom Line
Meta’s cache consistency comes from smart architecture, not just fast hardware:
- Specialized systems for different consistency needs (TAO vs Memcache)
- Fast invalidation over fast reads (consistency first)
- Graceful degradation during failures (availability when possible)
- Monitoring user experience (detect consistency violations)
The result? A system that serves 3 billion users with near-perfect consistency at planetary scale.
Want to dive deeper into distributed systems? Check out our other architecture deep-dives and learn how the world’s biggest platforms solve difficult problems.