OpenAI shared how they scale PostgreSQL to handle 800 million ChatGPT users. This is one of the largest PostgreSQL deployments in the world, and the lessons apply to any team scaling a relational database.
Most applications never hit this scale. But the techniques OpenAI uses work at every level. Whether you have 10,000 users or 10 million, these patterns will make your database faster and more reliable.
This post breaks down the key scaling strategies with practical code examples. No fluff. Just the stuff that works. For a broader overview, see the System Design Cheat Sheet.
The Challenge: 800 Million Users on PostgreSQL
ChatGPT is one of the fastest-growing applications in history. OpenAI needed their database to handle:
| Metric | Scale |
|---|---|
| Active users | 800+ million |
| Concurrent connections | Millions |
| Requests per second | Very high |
| Data growth | Massive |
Most teams would reach for a NoSQL database at this point. OpenAI stuck with PostgreSQL. Why? Because PostgreSQL is battle-tested, has excellent tooling, and with the right architecture, it scales further than most people think.
Strategy 1: Connection Pooling with PgBouncer
The first bottleneck at scale is not query speed. It is connections.
The Problem
Every PostgreSQL connection costs memory. A single connection can use 10MB or more. If you have 1,000 application servers each opening 10 connections, that is 10,000 connections consuming 100GB of memory just for connection overhead.
PostgreSQL was not designed for thousands of concurrent connections. It handles hundreds well. Thousands badly. Tens of thousands not at all.
The Solution: PgBouncer
PgBouncer is a lightweight connection pooler that sits between your application and PostgreSQL. Instead of each app instance maintaining its own connections, they all talk to PgBouncer. PgBouncer maintains a much smaller pool of actual database connections.
flowchart LR
subgraph apps[Application Servers]
A1[fa:fa-server App 1]
A2[fa:fa-server App 2]
A3[fa:fa-server App 3]
A4[fa:fa-server App N]
end
subgraph pooler[Connection Pooler]
PG[fa:fa-water PgBouncer]
end
subgraph database[PostgreSQL]
DB[(fa:fa-database Primary)]
end
A1 --> PG
A2 --> PG
A3 --> PG
A4 --> PG
PG --> DB
style A1 fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
style A2 fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
style A3 fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
style A4 fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
style PG fill:#fff3e0,stroke:#e65100,stroke-width:2px
style DB fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
Without PgBouncer: 1,000 app instances × 10 connections = 10,000 database connections
With PgBouncer: 1,000 app instances → PgBouncer → 200 database connections
That is a 50x reduction in database connections.
PgBouncer Configuration
Here is a basic PgBouncer configuration:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[databases]
myapp = host=localhost port=5432 dbname=myapp
[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
; Pool mode: transaction is best for web apps
pool_mode = transaction
; Connection settings
max_client_conn = 10000 ; Max connections FROM app servers TO PgBouncer
default_pool_size = 100 ; Connections FROM PgBouncer TO PostgreSQL (per database/user pair)
min_pool_size = 10 ; Keep at least this many connections open
reserve_pool_size = 5 ; Extra connections for burst traffic
Pool Modes Explained
PgBouncer has three pool modes. Choosing the right one matters:
| Mode | How It Works | Use When |
|---|---|---|
| session | Connection stays with client for entire session | Legacy apps that use session state |
| transaction | Connection assigned per transaction | Most web applications |
| statement | Connection assigned per statement | Only for autocommit queries |
Use transaction mode for most applications. It gives you the best connection reuse while still supporting transactions.
Practical Example: Django with PgBouncer
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# settings.py
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'HOST': 'pgbouncer.internal', # Point to PgBouncer, not PostgreSQL
'PORT': '6432', # PgBouncer port
'NAME': 'myapp',
'USER': 'myapp_user',
'PASSWORD': 'secret',
'CONN_MAX_AGE': 0, # Let PgBouncer manage pooling
'OPTIONS': {
'options': '-c statement_timeout=30000' # 30 second timeout
}
}
}
Important: Set CONN_MAX_AGE = 0 when using PgBouncer. Django’s connection pooling conflicts with PgBouncer. Let PgBouncer handle it.
Strategy 2: Read Replicas
Most applications are read-heavy. ChatGPT users read their conversation history far more often than they create new messages. OpenAI uses read replicas to handle this.
How Read Replicas Work
flowchart LR
subgraph App[" "]
direction TB
W[✏️ Writes]
R[📖 Reads]
end
P[(🗄️ Primary)]
subgraph Replicas[" "]
direction TB
R1[(Replica 1)]
R2[(Replica 2)]
R3[(Replica 3)]
end
W -->|write| P
R -->|read| Replicas
P -.->|sync| R1
P -.->|sync| R2
P -.->|sync| R3
style W fill:#fee2e2,stroke:#dc2626,stroke-width:2px,color:#991b1b
style R fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#166534
style P fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#92400e
style R1 fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e40af
style R2 fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e40af
style R3 fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e40af
- Writes go to the primary database
- Reads go to the replicas
- Changes on the primary replicate to all replicas
This separation of reads and writes is a core principle in distributed systems. For a formalized version of this pattern, see CQRS Pattern Guide.
Setting Up PostgreSQL Streaming Replication
On the primary server:
1
2
3
4
-- postgresql.conf on primary
wal_level = replica
max_wal_senders = 10
wal_keep_size = 1GB
1
2
-- Create replication user
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'secret';
On the replica:
1
2
3
4
5
6
7
8
# Stop PostgreSQL
sudo systemctl stop postgresql
# Remove existing data
rm -rf /var/lib/postgresql/15/main/*
# Copy data from primary
pg_basebackup -h primary.internal -U replicator -D /var/lib/postgresql/15/main -P -R
The -R flag creates a standby.signal file and configures the replica automatically.
Routing Reads and Writes in Your Application
The application needs to know which queries go to which database. Here is a simple Python example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import random
from contextlib import contextmanager
class DatabaseRouter:
def __init__(self, primary_conn, replica_conns):
self.primary = primary_conn
self.replicas = replica_conns
@contextmanager
def read_connection(self):
"""Get a random read replica connection."""
conn = random.choice(self.replicas)
try:
yield conn
finally:
conn.commit()
@contextmanager
def write_connection(self):
"""Get the primary connection for writes."""
try:
yield self.primary
finally:
self.primary.commit()
# Usage
router = DatabaseRouter(primary, [replica1, replica2, replica3])
# Read operations
with router.read_connection() as conn:
cursor = conn.cursor()
cursor.execute("SELECT * FROM users WHERE id = %s", [user_id])
user = cursor.fetchone()
# Write operations
with router.write_connection() as conn:
cursor = conn.cursor()
cursor.execute("INSERT INTO messages (content) VALUES (%s)", [content])
Handling Replication Lag
Replicas are slightly behind the primary. This is called replication lag. For most reads, a few milliseconds of lag does not matter. But sometimes you need to read your own writes immediately. For more on consistency in distributed systems, see Majority Quorum in Distributed Systems.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def get_user_after_update(user_id, use_primary=False):
"""
Get user data. Use primary if we just wrote and need fresh data.
"""
if use_primary:
conn = router.primary
else:
conn = random.choice(router.replicas)
cursor = conn.cursor()
cursor.execute("SELECT * FROM users WHERE id = %s", [user_id])
return cursor.fetchone()
# After an update, read from primary to get fresh data
update_user(user_id, new_email)
user = get_user_after_update(user_id, use_primary=True)
Strategy 3: Horizontal Sharding
When a single PostgreSQL instance cannot handle your data or traffic, you split the data across multiple instances. This is horizontal sharding.
How Sharding Works
flowchart TB
subgraph app[Application Layer]
SR[fa:fa-random Shard Router]
end
subgraph shards[Database Shards]
S1[(fa:fa-database Shard 1<br/>Users 1-1M)]
S2[(fa:fa-database Shard 2<br/>Users 1M-2M)]
S3[(fa:fa-database Shard 3<br/>Users 2M-3M)]
S4[(fa:fa-database Shard N<br/>Users 3M+)]
end
SR --> S1
SR --> S2
SR --> S3
SR --> S4
style SR fill:#fff3e0,stroke:#e65100,stroke-width:2px
style S1 fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
style S2 fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
style S3 fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
style S4 fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
Each shard is a complete PostgreSQL instance with the same schema. Data is partitioned based on a shard key, typically user_id or tenant_id.
Choosing a Shard Key
The shard key determines which shard holds which data. Choose wisely:
| Good Shard Keys | Bad Shard Keys |
|---|---|
| user_id | created_at (hot spots) |
| tenant_id | country (uneven distribution) |
| organization_id | status (low cardinality) |
The best shard key is one that:
- Distributes data evenly
- Keeps related data together
- Is included in most queries
Implementing a Shard Router
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import hashlib
class ShardRouter:
def __init__(self, shard_connections):
"""
shard_connections: dict of shard_id -> connection
"""
self.shards = shard_connections
self.num_shards = len(shard_connections)
def get_shard_id(self, user_id):
"""Consistent hash to determine shard."""
hash_value = int(hashlib.md5(str(user_id).encode()).hexdigest(), 16)
return hash_value % self.num_shards
def get_connection(self, user_id):
"""Get database connection for a user."""
shard_id = self.get_shard_id(user_id)
return self.shards[shard_id]
def execute_on_shard(self, user_id, query, params):
"""Execute query on the correct shard."""
conn = self.get_connection(user_id)
cursor = conn.cursor()
cursor.execute(query, params)
return cursor
# Usage
router = ShardRouter({
0: shard0_connection,
1: shard1_connection,
2: shard2_connection,
3: shard3_connection,
})
# Query automatically goes to the right shard
messages = router.execute_on_shard(
user_id=12345,
query="SELECT * FROM messages WHERE user_id = %s",
params=[12345]
)
Cross-Shard Queries
Sometimes you need to query across all shards. This is expensive but sometimes necessary:
1
2
3
4
5
6
7
8
def get_total_message_count():
"""Query all shards and aggregate results."""
total = 0
for shard_id, conn in router.shards.items():
cursor = conn.cursor()
cursor.execute("SELECT COUNT(*) FROM messages")
total += cursor.fetchone()[0]
return total
Avoid cross-shard queries in hot paths. They do not scale. If you need aggregated data frequently, maintain it separately (in a cache or analytics database).
Sharding Tools
You do not have to build sharding from scratch. These tools handle the complexity:
| Tool | Type | Best For |
|---|---|---|
| Citus | PostgreSQL extension | Transparent sharding, stays within PostgreSQL ecosystem |
| Vitess | Middleware proxy | Large-scale MySQL/PostgreSQL, used by YouTube and Slack |
| pgcat | Connection pooler | Lightweight sharding with pooling built-in |
| pg_partman | PostgreSQL extension | Time-based partitioning (not true sharding but helps) |
Citus is the most popular choice for PostgreSQL. It turns PostgreSQL into a distributed database:
1
2
3
4
5
-- Create a distributed table with Citus
SELECT create_distributed_table('messages', 'user_id');
-- Queries automatically route to the correct shard
SELECT * FROM messages WHERE user_id = 12345;
For most teams: Start with application-level sharding (the Python router shown above) to understand your access patterns. Move to Citus or Vitess when managing shards manually becomes painful.
Strategy 4: Query Optimization
At scale, bad queries become very bad queries. A query that takes 100ms with 1,000 rows might take 10 seconds with 10 million rows.
Using EXPLAIN ANALYZE
Always analyze slow queries:
1
2
3
4
5
6
7
EXPLAIN ANALYZE
SELECT m.*, u.name
FROM messages m
JOIN users u ON m.user_id = u.id
WHERE m.created_at > '2026-01-01'
ORDER BY m.created_at DESC
LIMIT 100;
Look for:
- Seq Scan on large tables (needs an index)
- High actual time compared to estimated
- Rows removed by filter (query scanning too much data)
Index Strategies
Create indexes for your query patterns:
1
2
3
4
5
6
7
8
9
10
11
12
13
-- For filtering by user and time
CREATE INDEX idx_messages_user_created
ON messages(user_id, created_at DESC);
-- For covering queries (includes all columns needed)
CREATE INDEX idx_messages_covering
ON messages(user_id, created_at DESC)
INCLUDE (content, status);
-- Partial index for active records only
CREATE INDEX idx_messages_active
ON messages(user_id, created_at)
WHERE status = 'active';
See How Database Indexing Works for a deep dive on index types and when to use each. For the underlying data structure, see B-Tree Data Structure Explained.
Query Patterns That Kill Performance
Bad: Using functions on indexed columns
1
2
3
4
5
6
7
8
-- Index on created_at will NOT be used
SELECT * FROM messages
WHERE DATE(created_at) = '2026-01-15';
-- Better: Keep the column bare
SELECT * FROM messages
WHERE created_at >= '2026-01-15'
AND created_at < '2026-01-16';
*Bad: SELECT **
1
2
3
4
5
-- Fetches all columns, even unused ones
SELECT * FROM users WHERE id = 123;
-- Better: Only fetch what you need
SELECT id, name, email FROM users WHERE id = 123;
Bad: N+1 queries
1
2
3
4
5
6
7
# BAD: N+1 queries
users = db.execute("SELECT * FROM users LIMIT 100")
for user in users:
messages = db.execute(
"SELECT * FROM messages WHERE user_id = %s",
[user.id]
)
1
2
3
4
5
6
7
# GOOD: Single query with JOIN
results = db.execute("""
SELECT u.*, m.*
FROM users u
LEFT JOIN messages m ON m.user_id = u.id
WHERE u.id IN (SELECT id FROM users LIMIT 100)
""")
See N+1 Query Problem Explained for more on this common performance killer.
Strategy 5: Connection Management
At massive scale, connection management becomes critical. Here is what matters:
Connection Timeouts
Set aggressive timeouts to prevent runaway queries:
1
2
3
-- In postgresql.conf
statement_timeout = '30s' -- Kill queries after 30 seconds
idle_in_transaction_session_timeout = '60s' -- Kill idle transactions
Connection Limits
Configure limits to prevent overload:
1
2
-- In postgresql.conf
max_connections = 200 -- Actual connections to PostgreSQL
With PgBouncer in front:
1
2
3
# pgbouncer.ini
max_client_conn = 10000 -- Connections from apps to PgBouncer
default_pool_size = 100 -- Connections from PgBouncer to PostgreSQL
Monitoring Connections
Keep an eye on connection usage:
1
2
3
4
5
6
7
8
9
10
-- Current connections by state
SELECT state, COUNT(*)
FROM pg_stat_activity
GROUP BY state;
-- Long-running queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active'
AND now() - pg_stat_activity.query_start > interval '1 minute';
Strategy 6: Caching
Not every request needs to hit the database. Caching reduces load significantly.
Application-Level Caching with Redis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import redis
import json
cache = redis.Redis(host='redis.internal', port=6379)
def get_user(user_id):
# Try cache first
cache_key = f"user:{user_id}"
cached = cache.get(cache_key)
if cached:
return json.loads(cached)
# Cache miss: query database
user = db.execute(
"SELECT * FROM users WHERE id = %s",
[user_id]
).fetchone()
# Store in cache for 5 minutes
cache.setex(cache_key, 300, json.dumps(user))
return user
def update_user(user_id, data):
# Update database
db.execute(
"UPDATE users SET name = %s WHERE id = %s",
[data['name'], user_id]
)
# Invalidate cache
cache.delete(f"user:{user_id}")
Query Result Caching
For expensive queries that do not change often:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def get_popular_messages():
cache_key = "popular_messages"
cached = cache.get(cache_key)
if cached:
return json.loads(cached)
# Expensive aggregation query
results = db.execute("""
SELECT m.*, COUNT(r.id) as reaction_count
FROM messages m
LEFT JOIN reactions r ON r.message_id = m.id
WHERE m.created_at > NOW() - INTERVAL '7 days'
GROUP BY m.id
ORDER BY reaction_count DESC
LIMIT 100
""").fetchall()
# Cache for 10 minutes
cache.setex(cache_key, 600, json.dumps(results))
return results
For a deeper look at caching patterns, see Caching Strategies Explained.
Strategy 7: Monitoring and Observability
You cannot fix what you cannot see. At scale, monitoring is essential.
Key Metrics to Track
These metrics form the foundation of your database SLIs. For the full framework on setting reliability targets, see SLI, SLO, SLA Explained.
| Metric | Why It Matters |
|---|---|
| Connections | Approaching limits causes failures |
| Query latency (p50, p95, p99) | User experience |
| Replication lag | Data freshness on replicas |
| Cache hit rate | Database load |
| Slow query count | Performance regressions |
| Disk I/O | Storage bottlenecks |
pg_stat_statements for Query Analysis
Enable the pg_stat_statements extension to track query performance:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Enable the extension
CREATE EXTENSION pg_stat_statements;
-- Top 10 queries by total time
SELECT
query,
calls,
total_exec_time / 1000 as total_seconds,
mean_exec_time as avg_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
-- Queries with high execution time variance
SELECT
query,
calls,
mean_exec_time as avg_ms,
stddev_exec_time as stddev_ms
FROM pg_stat_statements
WHERE calls > 100
ORDER BY stddev_exec_time DESC
LIMIT 10;
Architecture Summary
Here is the complete architecture OpenAI uses to scale PostgreSQL:
flowchart LR
subgraph Apps[fa:fa-server Application Tier]
A1[fa:fa-server App 1]
A2[fa:fa-server App 2]
A3[fa:fa-server App N]
end
subgraph Cache[fa:fa-bolt Cache Layer]
Redis[(fa:fa-database Redis)]
end
subgraph Pool[fa:fa-random Connection Pool]
PGB[fa:fa-water PgBouncer]
end
subgraph Shard1[fa:fa-layer-group Shard 1 - Users A-M]
P1[(fa:fa-database Primary)]
R1a[(fa:fa-clone Replica)]
R1b[(fa:fa-clone Replica)]
end
subgraph Shard2[fa:fa-layer-group Shard 2 - Users N-Z]
P2[(fa:fa-database Primary)]
R2a[(fa:fa-clone Replica)]
R2b[(fa:fa-clone Replica)]
end
A1 & A2 & A3 -->|cache check| Redis
A1 & A2 & A3 -->|queries| PGB
PGB -->|writes| P1
PGB -->|writes| P2
PGB -->|reads| R1a & R1b
PGB -->|reads| R2a & R2b
P1 -.->|replication| R1a
P1 -.->|replication| R1b
P2 -.->|replication| R2a
P2 -.->|replication| R2b
style A1 fill:#dbeafe,stroke:#2563eb,stroke-width:2px
style A2 fill:#dbeafe,stroke:#2563eb,stroke-width:2px
style A3 fill:#dbeafe,stroke:#2563eb,stroke-width:2px
style Redis fill:#fef3c7,stroke:#d97706,stroke-width:2px
style PGB fill:#fed7aa,stroke:#ea580c,stroke-width:2px
style P1 fill:#bbf7d0,stroke:#16a34a,stroke-width:2px
style P2 fill:#bbf7d0,stroke:#16a34a,stroke-width:2px
style R1a fill:#dcfce7,stroke:#22c55e,stroke-width:2px
style R1b fill:#dcfce7,stroke:#22c55e,stroke-width:2px
style R2a fill:#dcfce7,stroke:#22c55e,stroke-width:2px
style R2b fill:#dcfce7,stroke:#22c55e,stroke-width:2px
Request flow:
- Application checks Redis cache first
- Cache miss → request goes to PgBouncer
- PgBouncer routes to the correct shard based on user_id
- Reads go to replicas, writes go to primary
- Results cached in Redis for future requests
Key Takeaways
-
Start with connection pooling. PgBouncer is the easiest win. Most applications hit connection limits before query performance limits.
-
Add read replicas early. Most workloads are 80%+ reads. Replicas distribute this load and give you redundancy.
-
Shard when you must. Sharding adds complexity. Do it when a single PostgreSQL instance cannot handle your data or traffic, not before.
-
Optimize queries first. A bad query on 10 shards is still a bad query. Fix the query patterns before scaling infrastructure.
-
Cache aggressively. The fastest database query is the one you do not make. Use Redis or Memcached for frequently accessed data.
-
Monitor everything. Use pg_stat_statements, track slow queries, and monitor connection usage. You cannot fix what you cannot measure. Consider performance testing with Grafana k6 before major releases.
-
PostgreSQL scales further than you think. With the right architecture, PostgreSQL handles hundreds of millions of users. Do not switch to a new database just because things got busy.
Related Posts:
- How Database Indexing Works - Deep dive into B-trees and index optimization
- B-Tree Data Structure - The data structure behind PostgreSQL indexes
- Caching Strategies Explained - When and how to add caching
- N+1 Query Problem - Fix this common performance killer
- How WhatsApp Scales to Billions - Another massive scale case study
- How Uber Handles 1M Requests/Second - Geospatial scaling at extreme scale
- CQRS Pattern Guide - Formal pattern for read/write separation
- System Design Cheat Sheet - Quick reference for scaling patterns
Have questions about scaling PostgreSQL? Share your experience in the comments below.