When you like a friend’s photo on Facebook, that like appears instantly across all their devices, to all their friends, and in all the places that photo appears on the platform. Behind this simple action lies one of the most sophisticated distributed cache consistency systems ever built.

Meta serves 3+ billion users with 99.99999999% cache consistency - that’s less than one inconsistent read per billion operations. They process trillions of cache operations daily across hundreds of data centers, while keeping data perfectly synchronized. This isn’t just about speed - it’s about making sure 3 billion people see the same version of reality.

The core challenges they solved:

  • Perfect consistency across global data centers
  • Sub-millisecond latency for cache reads
  • Instant invalidation when data changes
  • Fault tolerance during network partitions
  • Linear scaling to billions of operations

Let’s see how they built a system that never shows stale data.

Meta’s Cache Architecture: The 30,000-Foot View

Before diving into the consistency mechanisms, here’s how Meta’s cache system looks from above:


graph TB
    subgraph "📱 Client Apps"
        Clients[Web, Mobile, Internal Apps]
    end
    
    subgraph "🖥️ Application Servers"
        Apps[Web Servers & API Servers]
    end
    
    subgraph "⚡ Cache Layer"
        direction LR
        TAO[TAO
Graph Cache
Social data] Memcache[Memcache
Key-Value Cache
Profile data] end subgraph "🗃️ Storage Layer" direction LR MySQL[(MySQL
Shards)] GraphDB[(Graph
Database)] end subgraph "🔄 Consistency System" Invalidator[Cache Invalidation Service
Ensures cache consistency] end %% Main request flow Clients --> Apps Apps --> TAO Apps --> Memcache %% Cache to storage TAO --> GraphDB Memcache --> MySQL %% Consistency flow (simplified) GraphDB --> Invalidator MySQL --> Invalidator Invalidator --> TAO Invalidator --> Memcache

Two Key Data Flows:


graph LR
    subgraph "🔥 Hot Path: Write-Through (Strong Consistency)"
        direction TB
        App1[App Write Request]
        App1 --> TAO1[TAO Cache]
        TAO1 -->|1. Write to DB first| GraphDB1[(Graph DB)]
        GraphDB1 -->|2. Then invalidate| TAO1
        TAO1 -->|3. Response| App1
        
        style TAO1 fill:#ffebee
        style GraphDB1 fill:#e8f5e8
    end
    
    subgraph "❄️ Cold Path: Write-Behind (Eventual Consistency)"
        direction TB
        App2[App Write Request]
        App2 --> MySQL1[(MySQL)]
        MySQL1 -->|1. DB write triggers| Inval2[Async Invalidator]
        Inval2 -->|2. Invalidate later| Memcache1[Memcache]
        App2 -->|3. Immediate response| Response2[Response]
        
        style Memcache1 fill:#fff3e0
        style MySQL1 fill:#e8f5e8
    end

The Two-Layer Cache Strategy

Meta’s key insight: different data needs different consistency guarantees.

TAO: The Graph Cache (Strong Consistency)

  • Social graph data (friends, likes, comments)
  • Immediate consistency required
  • Write-through caching with instant invalidation
  • Handles complex relationships and queries

Memcache: The Data Cache (Eventual Consistency)

  • Profile data, posts, media metadata
  • Can tolerate brief inconsistency
  • Write-behind caching with delayed invalidation
  • Optimized for raw throughput

Why split them? Social interactions must be instantly consistent (you can’t “unlike” something that never showed as liked), but profile updates can have eventual consistency.

Here’s how a like flows through both systems:


sequenceDiagram
    participant User as 👤 User
    participant Web as 🖥️ Web Server
    participant TAO as ⚡ TAO Cache
    participant DB as 🗃️ Database
    participant Invalidator as 🚫 Invalidator
    participant Friends as 👥 Friend Feeds
    
    User->>Web: Like photo
    Web->>TAO: Check permissions
    TAO->>DB: Read graph data
    Web->>DB: Write like record
    DB->>Invalidator: Trigger invalidation
    Invalidator->>TAO: Invalidate like cache
    Invalidator->>Friends: Invalidate feed caches
    Web->>User: Confirm like
    
    Note over TAO,Friends: All caches updated before response

The TAO Graph Cache: Where Consistency Matters Most

TAO (The Associations and Objects) is Meta’s secret weapon for perfect consistency. It’s a write-through cache that sits between applications and the graph database.


graph LR
    subgraph "TAO Architecture"
        subgraph "TAO Tier 1 (Follower)"
            TAO1[TAO Follower 1]
            TAO2[TAO Follower 2] 
            TAO3[TAO Follower 3]
        end
        
        subgraph "TAO Tier 2 (Leader)"
            Leader[TAO Leader]
        end
        
        subgraph "Storage"
            MySQL[(MySQL Shard)]
        end
        
        subgraph "Invalidation"
            AsyncInval[Async Invalidation]
        end
    end
    
    Apps[Applications] --> TAO1
    Apps --> TAO2
    Apps --> TAO3
    
    TAO1 -.->|Cache Miss| Leader
    TAO2 -.->|Cache Miss| Leader
    TAO3 -.->|Cache Miss| Leader
    
    Leader --> MySQL
    MySQL --> AsyncInval
    AsyncInval --> TAO1
    AsyncInval --> TAO2
    AsyncInval --> TAO3

TAO’s Consistency Guarantees

Read-after-write consistency: Your writes are immediately visible to your reads.

Monotonic read consistency: Once you see a value, you never see an older version.

Timeline consistency: Events appear in the same order across all views.

The Invalidation Pipeline

When data changes, TAO’s invalidation system ensures every cache gets updated:

  1. Write committed to MySQL
  2. MySQL trigger fires invalidation message
  3. Async invalidation service processes the message
  4. All TAO caches get invalidated simultaneously
  5. Next read repopulates with fresh data

This happens in under 10 milliseconds globally.

Memcache: Scaling Raw Performance

For non-social data, Meta uses a massive memcache deployment with eventual consistency.


graph TB
    subgraph "Regional Cluster (US West)"
        subgraph "Frontend Cluster"
            WebUSW[Web Servers]
            McRouterUSW[McRouter]
        end
        subgraph "Cache Pool"
            Pool1USW[Pool 1: User Data]
            Pool2USW[Pool 2: Content] 
            Pool3USW[Pool 3: Media]
        end
    end
    
    subgraph "Regional Cluster (US East)"
        subgraph "Frontend Cluster"
            WebUSE[Web Servers]
            McRouterUSE[McRouter]
        end
        subgraph "Cache Pool"
            Pool1USE[Pool 1: User Data]
            Pool2USE[Pool 2: Content]
            Pool3USE[Pool 3: Media]
        end
    end
    
    subgraph "Master Database"
        MySQL[(MySQL Master)]
        Invalidation[Invalidation Service]
    end
    
    WebUSW --> McRouterUSW
    WebUSE --> McRouterUSE
    McRouterUSW --> Pool1USW
    McRouterUSW --> Pool2USW
    McRouterUSW --> Pool3USW
    McRouterUSE --> Pool1USE
    McRouterUSE --> Pool2USE
    McRouterUSE --> Pool3USE
    
    MySQL --> Invalidation
    Invalidation --> Pool1USW
    Invalidation --> Pool1USE
    Invalidation --> Pool2USW
    Invalidation --> Pool2USE
    Invalidation --> Pool3USW
    Invalidation --> Pool3USE

McRouter: The Smart Proxy

McRouter sits between applications and memcache, providing:

  • Consistent hashing for automatic sharding
  • Failover handling when cache servers die
  • Connection pooling for efficiency
  • Invalidation routing to the right caches

The Gutter Pool Pattern

Problem: What happens when a cache server dies and you have a “hot key” that everyone’s trying to read?

Solution: The gutter pool - a separate set of cache servers that temporarily store data for failed servers.


graph LR
    subgraph "Normal Operation"
        App1[App Server] --> McRouter1[McRouter]
        McRouter1 --> Cache1[Cache Server 1]
        McRouter1 --> Cache2[Cache Server 2]
        McRouter1 --> Cache3[Cache Server 3]
    end
    
    subgraph "Failure Scenario"
        App2[App Server] --> McRouter2[McRouter]
        McRouter2 --> CacheX[❌ Cache Server 2]
        McRouter2 --> Cache1b[Cache Server 1]
        McRouter2 --> Cache3b[Cache Server 3]
        McRouter2 -.->|Hot Key Fallback| Gutter[Gutter Pool]
    end
    
    subgraph "Database"
        DB[(Database)]
    end
    
    Cache1 -.->|Cache Miss| DB
    Gutter -.->|Cache Miss| DB

When Cache Server 2 fails, hot keys that were stored there get temporarily stored in the Gutter Pool, preventing database overload.

Cross-Datacenter Consistency

Meta operates hundreds of data centers globally. How do they keep caches consistent across continents?

The Primary-Replica Invalidation Model


graph TB
    subgraph "Primary Region (US)"
        PrimaryDB[(Primary Database)]
        PrimaryInval[Primary Invalidation]
        PrimaryCache[Primary Cache]
    end
    
    subgraph "Replica Region (EU)"
        ReplicaDB[(Replica Database)]
        ReplicaInval[Replica Invalidation]
        ReplicaCache[Replica Cache]
    end
    
    subgraph "Replica Region (Asia)"
        ReplicaDB2[(Replica Database)]
        ReplicaInval2[Replica Invalidation]
        ReplicaCache2[Replica Cache]
    end
    
    subgraph "Global Invalidation Bus"
        MessageQueue[Global Message Queue]
    end
    
    PrimaryDB --> PrimaryInval
    PrimaryInval --> PrimaryCache
    PrimaryInval --> MessageQueue
    
    MessageQueue --> ReplicaInval
    MessageQueue --> ReplicaInval2
    
    ReplicaInval --> ReplicaCache
    ReplicaInval2 --> ReplicaCache2
    
    PrimaryDB -.->|Replication| ReplicaDB
    PrimaryDB -.->|Replication| ReplicaDB2

The Invalidation Message Flow

  1. Write happens in primary region
  2. Local invalidation occurs immediately
  3. Global invalidation message sent to message queue
  4. Replica regions receive invalidation message
  5. Local caches in replica regions get invalidated
  6. Next read fetches from local database replica

Critical insight: Invalidations travel faster than data replication, ensuring cache misses hit the local database replica with fresh data.

Cache Consistency During Network Partitions

The problem: What happens when data centers can’t talk to each other?

Meta’s Consistency Hierarchy

During network partitions, Meta has a clear hierarchy:

  1. Prefer consistency over availability for social graph data
  2. Prefer availability over consistency for content data
  3. Always prefer user safety (better to show old data than wrong data)

graph TB
    subgraph "Normal State"
        App1[App] --> Cache1[Cache]
        Cache1 --> DB1[(Database)]
    end
    
    subgraph "Partition State"
        App2[App] --> Cache2[Cache]
        Cache2 --> Nothing[❌ No DB Access]
        Cache2 -.->|Stale Reads Only| StaleData[Stale Data]
    end
    
    subgraph "Recovery State"
        App3[App] --> Cache3[Cache]
        Cache3 --> DB3[(Database)]
        DB3 --> FullInvalidation[Full Cache Invalidation]
    end
    
    Nothing -.->|Network Heals| DB3

The Lease System

To prevent thundering herds during failures, Meta uses a lease system:

  1. Cache miss occurs during partition
  2. Lease granted to one request per key
  3. Other requests wait for lease holder
  4. Lease holder fetches from database
  5. Result shared with all waiting requests

Monitoring Cache Consistency

Key insight: Monitor user experience, not just cache hit rates.

Meta tracks:

  • Consistency violations per billion operations (target: <1)
  • Invalidation propagation time (target: <10ms globally)
  • Stale read detection rate (target: <0.001%)
  • Cross-datacenter lag in invalidations (target: <50ms)

graph LR
    subgraph "Monitoring Pipeline"
        Logs[Cache Access Logs]
        Detector[Consistency Detector]
        Alerts[Real-time Alerts]
        Dashboard[Consistency Dashboard]
    end
    
    subgraph "Detection Methods"
        Timestamps[Timestamp Comparison]
        Checksums[Data Checksums]
        Heartbeats[Invalidation Heartbeats]
    end
    
    Logs --> Detector
    Detector --> Timestamps
    Detector --> Checksums
    Detector --> Heartbeats
    Detector --> Alerts
    Detector --> Dashboard

Technology Stack

Caching: TAO (custom), Memcache (modified), McRouter (custom proxy)

Invalidation: Custom async invalidation service, Kafka-like message queues

Storage: MySQL (heavily modified), Custom graph databas

Networking: Custom network protocols optimized for invalidation latency

Monitoring: Custom consistency monitoring, Real-time alerting

The Bottom Line

Meta’s cache consistency comes from smart architecture, not just fast hardware:

  • Specialized systems for different consistency needs (TAO vs Memcache)
  • Fast invalidation over fast reads (consistency first)
  • Graceful degradation during failures (availability when possible)
  • Monitoring user experience (detect consistency violations)

The result? A system that serves 3 billion users with near-perfect consistency at planetary scale.


Want to dive deeper into distributed systems? Check out our other architecture deep-dives and learn how the world’s biggest platforms solve difficult problems.