@Ajit5ingh

Change Data Capture (CDC)

Keeping your cache in sync with your database

What is CDC?

Change Data Capture (CDC) is a way to track changes in your database and automatically update other systems - like your cache, search index, or analytics warehouse. Instead of constantly asking "did anything change?", CDC tells you "hey, this just changed!" the moment it happens.

Think of it like: Your database sending push notifications whenever data changes, so your cache always knows what to update.

The Problem: Stale Cache

Without CDC

Your cache gets out of sync with the database:

User updates email in database

Cache still has old email

App shows wrong data!

Solution? Manually clear cache or wait for it to expire. Slow and error-prone.

With CDC

Changes automatically sync to your cache:

User updates email in database

CDC detects change instantly

Cache updates automatically!

Result: Cache always has fresh data. No manual work needed.

How CDC Works


sequenceDiagram
    participant App
    participant Database
    participant CDC
    participant Cache
    
    App->>Database: UPDATE user email
    Database->>Database: Write to transaction log
    Database-->>App: Success
    
    Note over CDC: Continuously reads
transaction log Database->>CDC: New change detected CDC->>Cache: Update user email Cache-->>CDC: Updated Note over Cache: Cache now has
fresh data App->>Cache: GET user Cache-->>App: Returns updated email

Key Benefits

Real-Time Updates

Changes propagate instantly. Your cache, search indexes, and other systems update as soon as data changes - no delays.

Low Database Impact

CDC reads from transaction logs, not your actual tables. No extra queries hitting your database - it just quietly watches the log.

No Code Changes

Your app doesn't need to know CDC exists. It writes to the database normally, and CDC handles the rest behind the scenes.

Common CDC Methods

Transaction Log

Reads the database's built-in transaction log (WAL, binlog, etc.). Most efficient - zero impact on database performance.

Best for: Production systems

Triggers

Database triggers fire on INSERT/UPDATE/DELETE and write changes to a separate table. Simple but adds overhead to every write.

Use when: Can't access logs

Timestamp Polling

Periodically query for rows with updated_at > last_check. Easy to set up but not real-time and adds database load.

Avoid if: You need real-time data

Common Use Cases

  • Cache Sync: Keep Redis or Memcached in sync with your database. User updates profile? Cache updates instantly.
  • Search Indexing: Auto-update Elasticsearch when products change. No manual reindexing needed.
  • Data Warehouse: Stream changes to your analytics database (Snowflake, BigQuery) for near real-time reporting.
  • Microservices: Keep data synced across services. Orders service updates inventory? Warehouse service knows immediately.
  • Event Streaming: Feed changes into Kafka for event-driven architectures and real-time processing.

CDC in Action


graph TD
    A[PostgreSQL Database] --> B[CDC Tool
Debezium/Maxwell/etc] B --> C[Redis Cache] B --> D[Elasticsearch] B --> E[Data Warehouse] B --> F[Kafka Stream] G[Your Application] --> A G --> C style A fill:#e0f2fe,stroke:#0369a1,stroke-width:2px style B fill:#fef3c7,stroke:#f59e0b,stroke-width:3px style C fill:#dcfce7,stroke:#16a34a,stroke-width:2px style D fill:#dcfce7,stroke:#16a34a,stroke-width:2px style E fill:#dcfce7,stroke:#16a34a,stroke-width:2px style F fill:#dcfce7,stroke:#16a34a,stroke-width:2px style G fill:#f3e8ff,stroke:#a855f7,stroke-width:2px

CDC sits between your database and everything else, automatically keeping them in sync

Popular CDC Tools

Debezium

Open-source, works with MySQL, PostgreSQL, MongoDB, SQL Server. Streams to Kafka. Industry standard.

Maxwell's Daemon

Simple CDC for MySQL. Outputs JSON. Great for getting started quickly.

AWS DMS / Fivetran

Managed CDC services. Less setup but costs money. Good for enterprise.

When to Use CDC

Use CDC When

  • You need real-time data sync
  • Multiple systems need the same data
  • Your cache keeps getting stale
  • Building event-driven architecture
  • Database handles lots of writes
  • Manual sync is too error-prone

Skip CDC When

  • Data rarely changes
  • Eventual consistency is fine
  • Only one system uses the data
  • Simple app with no cache
  • Can't access database logs
  • Team too small for complexity
← Back to All Explainers