Every time you run a database query, a B-tree is doing the heavy lifting. That SELECT statement that returns in 2 milliseconds from a table with 50 million rows? A B-tree made that possible.
B-trees are one of those data structures that every software developer should understand. Not because you will implement one from scratch (you probably will not), but because knowing how they work helps you write better queries, design better schemas, and understand why your database does what it does.
This guide covers everything you need to know about B-trees as a software developer. No academic jargon. Just practical knowledge that will help you build faster systems.
What is a B-Tree?
A B-tree is a self-balancing tree data structure designed for systems that read and write large blocks of data. It was invented in 1970 by Rudolf Bayer and Edward McCreight at Boeing Research Labs.
The key insight behind B-trees: disk access is slow. Reading from a hard drive or SSD is thousands of times slower than reading from RAM. B-trees minimize disk access by keeping the tree very shallow and packing many keys into each node.
Here is what makes B-trees different from binary search trees:
| Property | Binary Search Tree | B-Tree |
|---|---|---|
| Children per node | At most 2 | Many (often hundreds) |
| Tree height for 1 billion keys | ~30 levels | ~3-4 levels |
| Disk reads per search | ~30 | ~3-4 |
| Optimized for | Memory | Disk storage |
A binary search tree with 1 billion records would require about 30 disk reads to find any key. A B-tree with the same data needs just 3 or 4. That difference is why every major database uses B-trees.
B-Tree Structure
A B-tree node contains multiple keys and multiple child pointers. The keys divide the child pointers into ranges.
graph TD
subgraph BTree["B-Tree of Order 4"]
R["Root<br/>30 | 60"]
C1["10 | 20"]
C2["40 | 50"]
C3["70 | 80 | 90"]
L1["3 | 5 | 8"]
L2["12 | 15"]
L3["22 | 25 | 28"]
L4["35 | 38"]
L5["45 | 48"]
L6["55 | 58"]
L7["65 | 68"]
L8["75 | 78"]
L9["85 | 88 | 95"]
R --> C1
R --> C2
R --> C3
C1 --> L1
C1 --> L2
C1 --> L3
C2 --> L4
C2 --> L5
C2 --> L6
C3 --> L7
C3 --> L8
C3 --> L9
end
style R fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
style C1 fill:#fff3e0,stroke:#e65100,stroke-width:2px
style C2 fill:#fff3e0,stroke:#e65100,stroke-width:2px
style C3 fill:#fff3e0,stroke:#e65100,stroke-width:2px
style L1 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
style L2 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
style L3 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
style L4 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
style L5 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
style L6 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
style L7 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
style L8 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
style L9 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
In this B-tree:
- The root node has 2 keys (30, 60) and 3 children
- Keys less than 30 are in the left subtree
- Keys between 30 and 60 are in the middle subtree
- Keys greater than 60 are in the right subtree
B-Tree Properties
For a B-tree of order m (also called degree or branching factor):
- Maximum children per node: m
- Maximum keys per node: m - 1
- Minimum children for internal nodes: ⌈m/2⌉
- Minimum keys for internal nodes: ⌈m/2⌉ - 1
- All leaf nodes are at the same level
- Keys within a node are sorted
For example, a B-tree of order 5:
- Each node can have at most 5 children and 4 keys
- Internal nodes must have at least 3 children and 2 keys
- The root can have as few as 2 children (or be a leaf)
Why These Properties Matter
These rules keep the tree balanced and efficient:
| Property | Purpose |
|---|---|
| Max children = m | Limits node size to fit in one disk block |
| Min children = ⌈m/2⌉ | Prevents tree from becoming too sparse |
| All leaves at same level | Guarantees O(log n) search time |
| Sorted keys | Enables binary search within nodes |
How B-Tree Search Works
Searching in a B-tree starts at the root and works down to a leaf.
flowchart TD
S["Search for key 45"] --> R["Root: 30 | 60"]
R -->|"45 > 30 and 45 < 60"| C2["Node: 40 | 50"]
C2 -->|"45 > 40 and 45 < 50"| L5["Leaf: 45 | 48"]
L5 -->|"Found!"| F["Return 45"]
style S fill:#e0f2fe,stroke:#0369a1,stroke-width:2px
style R fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
style C2 fill:#fff3e0,stroke:#e65100,stroke-width:2px
style L5 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
style F fill:#dcfce7,stroke:#16a34a,stroke-width:2px
Algorithm:
- Start at the root node
- Use binary search to find the key or determine which child to visit
- If key is found, return it
- If at a leaf and key not found, return “not found”
- Otherwise, follow the appropriate child pointer and repeat
Time Complexity: O(log n)
More precisely, the number of disk reads is O(log_m n) where m is the order. For a B-tree of order 1000 with 1 billion keys:
- Height = log_1000(1,000,000,000) ≈ 3
Three disk reads to find any key among a billion. That is the power of B-trees.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def search(node, key):
"""Search for a key in a B-tree."""
i = 0
# Find the first key greater than or equal to search key
while i < len(node.keys) and key > node.keys[i]:
i += 1
# If key is found in this node
if i < len(node.keys) and key == node.keys[i]:
return (node, i)
# If this is a leaf node, key is not present
if node.is_leaf:
return None
# Go to the appropriate child
return search(node.children[i], key)
How B-Tree Insertion Works
Insertion in a B-tree maintains balance by splitting nodes when they become too full.
Step 1: Find the Correct Leaf
First, search for where the key should go. Insertions always happen at leaf nodes.
Step 2: Insert if Room
If the leaf has fewer than m-1 keys, just insert the key in sorted order. Done.
Step 3: Split if Full
If the leaf has m-1 keys (full), we need to split:
- Insert the new key (node now has m keys temporarily)
- Split the node into two nodes
- Move the middle key up to the parent
- If the parent is also full, split it too
- Repeat up to the root if necessary
flowchart TD
subgraph Before["Before: Insert 25 into full node"]
B1["10 | 20 | 30 | 40"]
end
subgraph After["After: Node splits, 25 moves up"]
A1["10 | 20"]
A2["30 | 40"]
A3["Parent gets 25"]
A3 --> A1
A3 --> A2
end
Before --> After
style B1 fill:#fee2e2,stroke:#dc2626,stroke-width:2px
style A1 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
style A2 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
style A3 fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
How Tree Height Increases
The only way a B-tree gets taller is when the root splits. When this happens:
- The root splits into two nodes
- A new root is created with one key (the middle key)
- The new root has two children (the split nodes)
This is different from binary search trees where height increases at the leaves. B-trees grow upward from the root.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def insert(root, key):
"""Insert a key into a B-tree."""
if root.is_full():
# Tree grows in height
new_root = BTreeNode()
new_root.children.append(root)
split_child(new_root, 0)
insert_non_full(new_root, key)
return new_root
else:
insert_non_full(root, key)
return root
def insert_non_full(node, key):
"""Insert key into a node that is not full."""
i = len(node.keys) - 1
if node.is_leaf:
# Insert key in sorted position
node.keys.append(None)
while i >= 0 and key < node.keys[i]:
node.keys[i + 1] = node.keys[i]
i -= 1
node.keys[i + 1] = key
else:
# Find child to insert into
while i >= 0 and key < node.keys[i]:
i -= 1
i += 1
if node.children[i].is_full():
split_child(node, i)
if key > node.keys[i]:
i += 1
insert_non_full(node.children[i], key)
How B-Tree Deletion Works
Deletion is the most complex B-tree operation because we must maintain the minimum key requirement.
Case 1: Key in Leaf with Extra Keys
If the leaf has more than the minimum keys, just remove the key. Simple.
Case 2: Key in Leaf with Minimum Keys
If the leaf has exactly the minimum keys, we need to borrow or merge:
Borrow from sibling: If a sibling has extra keys, rotate a key through the parent.
flowchart LR
subgraph Before["Before: Delete 5, node has minimum keys"]
P1["Parent: 10"]
L1["5 | 8"]
R1["12 | 15 | 18"]
P1 --> L1
P1 --> R1
end
subgraph After["After: Borrow from right sibling"]
P2["Parent: 12"]
L2["8 | 10"]
R2["15 | 18"]
P2 --> L2
P2 --> R2
end
Before --> After
style L1 fill:#fee2e2,stroke:#dc2626,stroke-width:2px
style R1 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
style L2 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
style R2 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
Merge with sibling: If both siblings have minimum keys, merge the node with a sibling and pull down the parent key.
Case 3: Key in Internal Node
If the key is in an internal node:
- If left child has extra keys: Replace key with predecessor (largest key in left subtree) and delete predecessor
- If right child has extra keys: Replace key with successor (smallest key in right subtree) and delete successor
- If both children have minimum keys: Merge children and delete key from merged node
Deletion can cause merges to cascade up to the root. If the root ends up with zero keys, the tree shrinks in height.
B-Tree vs B+ Tree
Most databases actually use B+ trees, not plain B-trees. The difference matters.
flowchart TB
subgraph BTree["B-Tree"]
BR["Root: 50"]
BL["Data: 30 | Data"]
BM["Data: 50 | Data"]
BRR["Data: 70 | Data"]
BR --> BL
BR --> BM
BR --> BRR
end
subgraph BPlusTree["B+ Tree"]
PR["Root: 50 (key only)"]
PL["30 | Data"]
PM["50 | Data"]
PRR["70 | Data"]
PR --> PL
PR --> PM
PR --> PRR
PL -->|"Linked"| PM
PM -->|"Linked"| PRR
end
style BR fill:#fff3e0,stroke:#e65100,stroke-width:2px
style PR fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
style PL fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
style PM fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
style PRR fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
| Feature | B-Tree | B+ Tree |
|---|---|---|
| Data storage | All nodes | Leaf nodes only |
| Internal nodes | Keys + data | Keys only |
| Leaf linking | Not linked | Linked list |
| Range queries | Slower | Fast (follow leaf links) |
| Duplicate keys | Harder | Easier (all in leaves) |
| Space for keys | Less (data takes room) | More (internal nodes smaller) |
Why B+ trees win for databases:
-
More keys per node: Since internal nodes store only keys (no data), they can fit more keys. More keys = fewer levels = fewer disk reads.
-
Faster range queries: All data is in leaves, and leaves are linked. To get all values between 10 and 50, find 10 in a leaf, then follow links until you pass 50.
-
More predictable performance: Every search goes to a leaf. No lucky early matches in internal nodes.
PostgreSQL, MySQL InnoDB, SQLite, and most other databases use B+ trees.
B-Trees in Databases
Understanding how databases use B-trees helps you write better queries.
How Indexes Work
When you create an index in a database, it builds a B-tree (or B+ tree):
1
CREATE INDEX idx_users_email ON users(email);
This creates a B-tree where:
- Keys are email addresses (sorted)
- Values are pointers to the actual rows (row IDs or primary keys)
flowchart LR
subgraph Index["B+ Tree Index on Email"]
IR["d... | m..."]
IL["alice@... | bob@... | charlie@..."]
IM["david@... | eve@... | frank@..."]
IRR["mary@... | zack@..."]
IR --> IL
IR --> IM
IR --> IRR
end
subgraph Table["Table Data"]
T1["Row 1: alice@..., Alice, ..."]
T2["Row 2: bob@..., Bob, ..."]
T3["Row 3: charlie@..., Charlie, ..."]
end
IL -->|"Pointer"| T1
IL -->|"Pointer"| T2
IL -->|"Pointer"| T3
style IR fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
style IL fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
style IM fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
style IRR fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
Why Index Order Matters for Composite Indexes
For a composite index on (last_name, first_name, age):
1
CREATE INDEX idx_name_age ON users(last_name, first_name, age);
The B-tree sorts by last_name first, then first_name, then age. This means:
1
2
3
4
5
6
7
8
9
10
11
-- Uses the index (leftmost prefix)
SELECT * FROM users WHERE last_name = 'Smith';
-- Uses the index (first two columns)
SELECT * FROM users WHERE last_name = 'Smith' AND first_name = 'John';
-- Uses the full index
SELECT * FROM users WHERE last_name = 'Smith' AND first_name = 'John' AND age = 30;
-- Does NOT use the index efficiently (skips last_name)
SELECT * FROM users WHERE first_name = 'John';
This is called the leftmost prefix rule. The B-tree can only be traversed from left to right in the key order.
For more on database indexing, check out Database Indexing Explained guide.
How Range Queries Work
B+ trees excel at range queries because of leaf linking:
1
SELECT * FROM orders WHERE order_date BETWEEN '2025-01-01' AND '2025-12-31';
The database:
- Uses the B-tree to find the first matching leaf (order_date = ‘2025-01-01’)
- Follows leaf links until order_date > ‘2025-12-31’
- Returns all matching rows
No need to traverse back up and down the tree. Just follow the chain.
B-Tree Time Complexity
All B-tree operations have logarithmic time complexity:
| Operation | Time Complexity | Disk Reads |
|---|---|---|
| Search | O(log n) | O(log_m n) |
| Insert | O(log n) | O(log_m n) |
| Delete | O(log n) | O(log_m n) |
| Range query | O(log n + k) | O(log_m n + k/m) |
Where:
- n = number of keys
- m = order of the tree (max children)
- k = number of keys in range
Space Complexity: O(n)
Real Numbers
For a B-tree of order 500 (common in databases):
| Keys | Tree Height | Max Disk Reads |
|---|---|---|
| 1,000 | 2 | 2 |
| 1,000,000 | 3 | 3 |
| 1,000,000,000 | 4 | 4 |
A billion keys, 4 disk reads. This is why B-trees dominate database systems.
When to Use B-Trees
Good Use Cases
- Database indexes
- File systems (NTFS, HFS+, ext4)
- Key-value stores with disk storage
- Range queries on sorted data
- Systems requiring sorted traversal
- Any large dataset on secondary storage
Consider Alternatives
- Small datasets (array or hash table is simpler)
- In-memory only (red-black tree may be faster)
- No range queries needed (hash table is O(1))
- Write-heavy workloads (LSM trees might be better)
- Full-text search (inverted index is better)
Real World Examples
SQLite
SQLite uses B+ trees for everything. Tables are stored as B+ trees with the rowid as the key. Indexes are separate B+ trees with the indexed column as key and rowid as value.
PostgreSQL
PostgreSQL’s default index type is a B-tree. It also supports other types (hash, GiST, GIN), but B-tree covers most use cases. The implementation is described in PostgreSQL’s B-tree documentation.
MySQL InnoDB
InnoDB uses B+ trees for both the clustered index (primary key) and secondary indexes. The clustered index stores the actual row data in leaf nodes. Secondary index leaf nodes store the primary key, requiring a second B-tree lookup to get the row.
File Systems
NTFS, HFS+, ext4, and most modern file systems use B-trees or variants to organize directory entries and file metadata. This allows quick file lookups even in directories with thousands of files.
Common Interview Questions
Q: Why use B-trees instead of binary search trees?
B-trees minimize disk access by packing many keys per node, reducing tree height. A binary tree needs log_2(n) levels while a B-tree needs log_m(n) levels where m can be hundreds or thousands.
Q: What is the minimum and maximum height of a B-tree?
Minimum height: ⌈log_m(n + 1)⌉ Maximum height: ⌊log_{⌈m/2⌉}((n + 1)/2)⌋ + 1
In practice, B-trees stay very shallow.
Q: Why do all leaves have to be at the same level?
This guarantees that every search takes the same number of steps (worst case = height). It is what makes B-trees predictable and balanced.
Q: Can B-trees have duplicate keys?
In standard B-trees, no. But databases handle duplicates by either:
- Storing multiple row pointers per key
- Appending a unique identifier to make keys unique
- Using a B+ tree variation that allows duplicates in leaves
Q: What happens during high write loads?
Many insertions can cause frequent splits. Deletions can cause merges. This is why some systems use LSM trees (Log-Structured Merge Trees) for write-heavy workloads. LSM trees batch writes and are faster for inserts, though slower for reads.
Key Takeaways
-
B-trees minimize disk access. By packing many keys per node, they keep tree height low. 3-4 disk reads to find any key among billions.
-
All operations are O(log n). Search, insert, and delete all maintain logarithmic performance.
-
Most databases use B+ trees. Data only in leaves + leaf linking = faster range queries.
-
Understand the leftmost prefix rule. Composite indexes can only be traversed left to right. Design indexes based on your query patterns.
-
Insertions split nodes upward. The tree grows from the root, keeping all leaves at the same level.
-
Deletions may require borrowing or merging. Maintaining minimum keys can cascade changes up the tree.
-
Order matters for performance. Higher order = fewer levels = fewer disk reads, but larger nodes.
-
B-trees are everywhere. Databases, file systems, key-value stores. Understanding them helps you work with these systems effectively.
Further Reading:
- Database Indexing Explained - How indexes use B-trees in practice
- Skip List Explained - A simpler alternative for in-memory sorted data (used by Redis and LevelDB)
- Graph Data Structure Explained - Another fundamental data structure
- Bloom Filter Explained - When you need fast membership testing
- Use The Index, Luke - Excellent resource on SQL indexing and B-trees
- PostgreSQL B-tree Documentation
- SQLite B-tree Implementation
Building systems that handle large datasets? Understanding B-trees is just the start. Check out How Kafka Works to learn about handling high-throughput data streams, or Caching Strategies Explained to reduce database load.