You have a sorted linked list with a million elements. Finding any element takes O(n) time because you have to traverse from the head. That is slow.

You could use a balanced tree instead. AVL trees and red-black trees give you O(log n) search. But implementing them correctly is hard. Rotations, color flipping, height balancing. It is easy to get wrong.

What if you could get O(log n) performance with a data structure that is actually simple to implement? No rotations. No rebalancing. Just flip a coin a few times.

That is what skip lists do. And they work well enough that Redis, LevelDB, and other production systems use them for their core operations.

What is a Skip List?

A skip list is a data structure that extends a sorted linked list with multiple levels of forward pointers. Think of it as adding express lanes on top of a regular linked list.

graph LR
    subgraph SkipList["Skip List Structure"]
        direction LR
        HEAD["HEAD<br/>L0-L3"]
        N10["10<br/>L0"]
        N20["20<br/>L0-L1"]
        N30["30<br/>L0-L3"]
        N40["40<br/>L0"]
        N50["50<br/>L0-L2"]
        N60["60<br/>L0"]
        N70["70<br/>L0-L3"]
        N80["80<br/>L0"]
        NIL["NIL"]
        
        HEAD -->|"L3"| N30
        N30 -->|"L3"| N70
        N70 -->|"L3"| NIL
        
        HEAD -->|"L2"| N30
        N30 -->|"L2"| N50
        N50 -->|"L2"| N70
        
        HEAD -->|"L1"| N20
        N20 -->|"L1"| N30
        N30 -->|"L1"| N50
        N50 -->|"L1"| N70
        
        HEAD -->|"L0"| N10
        N10 -->|"L0"| N20
        N20 -->|"L0"| N30
        N30 -->|"L0"| N40
        N40 -->|"L0"| N50
        N50 -->|"L0"| N60
        N60 -->|"L0"| N70
        N70 -->|"L0"| N80
        N80 -->|"L0"| NIL
    end
    
    style HEAD fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
    style N30 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
    style N70 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
    style N50 fill:#fff3e0,stroke:#e65100,stroke-width:2px
    style NIL fill:#f5f5f5,stroke:#616161,stroke-width:2px

In a regular linked list, you traverse every single node. In a skip list, you can jump over many nodes at once using the higher levels, then drop down to find the exact position.

William Pugh invented skip lists in 1989 as a simpler alternative to balanced trees. His original paper showed that skip lists perform comparably to AVL trees and 2-3 trees, but with much simpler algorithms.

The Key Insight

The genius of skip lists is using randomness instead of strict structure.

In a balanced tree, you maintain balance through careful rotations after every insertion and deletion. Get the rotation wrong, and your tree becomes unbalanced. Debugging is painful.

In a skip list, each node gets a random height when inserted. No rotations needed. No complex invariants to maintain. The randomness naturally creates a balanced structure with high probability.

How Skip List Structure Works

A skip list node contains:

  • A key (the value stored)
  • An array of forward pointers (one for each level the node participates in)
1
2
3
4
class SkipListNode:
    def __init__(self, key, level):
        self.key = key
        self.forward = [None] * (level + 1)  # Array of forward pointers

The head node spans all levels. Each regular node spans from level 0 up to its randomly assigned level.

Level Distribution

Levels are assigned randomly using a simple algorithm:

  1. Start at level 0
  2. Flip a coin
  3. If heads, increase level by 1 and flip again
  4. If tails, stop

This creates a geometric distribution where:

  • 50% of nodes have level 0 (base level only)
  • 25% have level 1
  • 12.5% have level 2
  • 6.25% have level 3
  • And so on
1
2
3
4
5
6
7
import random

def random_level(max_level, p=0.5):
    level = 0
    while random.random() < p and level < max_level:
        level += 1
    return level

The expected height of the tallest node in a skip list with n elements is O(log n). This is why search takes logarithmic time.

How Skip List Search Works

Searching in a skip list is like having express trains and local trains.

To find a value:

  1. Start at the head node at the highest level
  2. Move right while the next node’s key is less than the target
  3. When you cannot move right (next is greater or NIL), drop down one level
  4. Repeat until you find the target or reach level 0
flowchart LR
    subgraph Search["Search for 50"]
        S1["Start at HEAD<br/>Level 3"] --> S2["30 < 50<br/>Move right"]
        S2 --> S3["70 > 50<br/>Drop to Level 2"]
        S3 --> S4["50 = 50<br/>Found!"]
    end
    
    style S1 fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
    style S2 fill:#fff3e0,stroke:#e65100,stroke-width:2px
    style S3 fill:#fff3e0,stroke:#e65100,stroke-width:2px
    style S4 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px

Search Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def search(self, target):
    current = self.head
    
    # Start from highest level and work down
    for level in range(self.level, -1, -1):
        # Move right while next node's key is less than target
        while current.forward[level] and current.forward[level].key < target:
            current = current.forward[level]
    
    # Move to level 0 candidate
    current = current.forward[0]
    
    # Check if we found the target
    if current and current.key == target:
        return current
    return None

Why Search is O(log n)

At each level, you skip over roughly half the remaining nodes. This is similar to binary search.

With probability p = 0.5:

  • Level k has about n/2^k nodes
  • You traverse at most O(1) nodes at each level before dropping down
  • There are O(log n) levels

Total comparisons: O(log n)

How Skip List Insertion Works

Insertion has two phases:

  1. Find where the new node should go (like search)
  2. Insert the node at its random level and update pointers

Step by Step Insertion

To insert value 45:

  1. Search for position, tracking nodes where you dropped down (these are the update points)
  2. Generate random level for new node
  3. Create new node and connect it at each level
flowchart LR
    subgraph Before["Before Insert"]
        direction LR
        BH["HEAD"] --> B30["30"] --> B40["40"] --> B50["50"] --> B70["70"]
    end
    
    subgraph After["After Insert 45"]
        direction LR
        AH["HEAD"] --> A30["30"] --> A40["40"] --> A45["45"] --> A50["50"] --> A70["70"]
    end
    
    Before --> After
    
    style BH fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
    style B30 fill:#fff3e0,stroke:#e65100,stroke-width:2px
    style B40 fill:#fff3e0,stroke:#e65100,stroke-width:2px
    style B50 fill:#fff3e0,stroke:#e65100,stroke-width:2px
    style B70 fill:#fff3e0,stroke:#e65100,stroke-width:2px
    style AH fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
    style A30 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
    style A40 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
    style A45 fill:#fce4ec,stroke:#c2185b,stroke-width:3px
    style A50 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
    style A70 fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px

Insertion Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def insert(self, key):
    update = [None] * (self.max_level + 1)
    current = self.head
    
    # Find position and track update points
    for level in range(self.level, -1, -1):
        while current.forward[level] and current.forward[level].key < key:
            current = current.forward[level]
        update[level] = current
    
    # Generate random level for new node
    new_level = random_level(self.max_level)
    
    # If new level exceeds current max, update head pointers
    if new_level > self.level:
        for level in range(self.level + 1, new_level + 1):
            update[level] = self.head
        self.level = new_level
    
    # Create new node
    new_node = SkipListNode(key, new_level)
    
    # Insert node by updating forward pointers
    for level in range(new_level + 1):
        new_node.forward[level] = update[level].forward[level]
        update[level].forward[level] = new_node

The beauty of this algorithm: no rotations, no rebalancing. Just pointer updates.

How Skip List Deletion Works

Deletion is the reverse of insertion:

  1. Find the node to delete (tracking update points)
  2. Update forward pointers at each level to skip over the deleted node
  3. Reduce skip list level if the highest level is now empty

Deletion Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def delete(self, key):
    update = [None] * (self.max_level + 1)
    current = self.head
    
    # Find the node and track update points
    for level in range(self.level, -1, -1):
        while current.forward[level] and current.forward[level].key < key:
            current = current.forward[level]
        update[level] = current
    
    target = current.forward[0]
    
    if target and target.key == key:
        # Remove target from each level
        for level in range(self.level + 1):
            if update[level].forward[level] != target:
                break
            update[level].forward[level] = target.forward[level]
        
        # Reduce level if highest levels are now empty
        while self.level > 0 and self.head.forward[self.level] is None:
            self.level -= 1
        
        return True
    return False

Skip List Time Complexity

Operation Average Case Worst Case
Search O(log n) O(n)
Insert O(log n) O(n)
Delete O(log n) O(n)
Space O(n) O(n log n)

The worst case happens if all nodes get the same level (extremely unlikely) or if levels are heavily skewed. But the probability of this is negligible.

Probability Analysis

For a skip list with n = 250 elements, William Pugh calculated that the probability of any search taking more than 3 times the expected time is less than one in a million.

The expected number of comparisons for a search is:

1
(log₂ n) / p + 1/p

With p = 0.5, this is roughly 2 log₂ n comparisons. Very close to a perfectly balanced binary search tree.

Skip List vs Balanced Trees

Why would you choose a skip list over a red-black tree or AVL tree?

Feature Skip List Red-Black Tree AVL Tree
Search complexity O(log n) avg O(log n) O(log n)
Implementation complexity Simple Complex Complex
Rotations needed No Yes Yes
Rebalancing No Yes Yes
Concurrent access Easy Hard Hard
Range queries Fast Fast Fast
Memory overhead ~2n pointers 3n pointers 3n pointers
Cache locality Good Variable Variable

When to Choose Skip Lists

Simplicity matters: If you need to implement the data structure yourself, skip lists are much easier to get right. Red-black tree insertion has multiple cases and rotations. Skip list insertion is straightforward.

Concurrent access: Skip lists naturally support concurrent operations. Insertions only modify local pointers, making lock-free implementations feasible. Java’s ConcurrentSkipListMap takes advantage of this.

Frequent insertions: Skip lists handle rapid insertions well since there is no rebalancing overhead.

When Balanced Trees Might Be Better

Guaranteed worst case: If you absolutely cannot tolerate O(n) worst case (even with one in a million probability), balanced trees guarantee O(log n).

Memory constrained: Skip lists have more pointer overhead per node on average.

Write-once, read-many: If data is inserted once and searched many times, the extra implementation complexity of balanced trees might be worth it for slightly faster operations.

Real World Skip List Usage

Redis Sorted Sets

Redis uses skip lists for its sorted set (ZSET) implementation. When you run commands like:

1
2
3
ZADD leaderboard 1000 "player1"
ZADD leaderboard 2500 "player2"
ZRANK leaderboard "player1"

Redis stores this in a skip list. The skip list allows:

  • O(log n) insert of new scores
  • O(log n) rank queries
  • O(log n) range queries by score

Antirez (Redis creator) chose skip lists over balanced trees because they are simpler to implement and debug, and perform well in practice.

LevelDB and RocksDB

Google’s LevelDB and Facebook’s RocksDB use skip lists for their MemTable (in-memory buffer before flushing to disk).

When you write to LevelDB:

  1. Data goes into a skip list in memory
  2. When the skip list reaches a size threshold, it flushes to disk as a sorted file
  3. The skip list provides fast writes and ordered iteration

Skip lists are ideal here because:

  • Writes are frequent and must be fast
  • Data needs to be iterated in sorted order during flush
  • Simple implementation reduces bugs

Apache Lucene

Lucene uses skip lists in its inverted index for efficient posting list intersection. When searching for documents matching multiple terms, skip lists allow jumping ahead in sorted document ID lists.

Java ConcurrentSkipListMap

Java’s standard library includes ConcurrentSkipListMap and ConcurrentSkipListSet. These provide thread-safe sorted collections without global locks.

1
2
3
4
5
6
7
8
9
ConcurrentSkipListMap<Integer, String> map = new ConcurrentSkipListMap<>();
map.put(3, "three");
map.put(1, "one");
map.put(2, "two");

// Iteration is in sorted order
for (Map.Entry<Integer, String> entry : map.entrySet()) {
    System.out.println(entry.getKey() + ": " + entry.getValue());
}

Complete Skip List Implementation

Here is a complete, working skip list implementation in Python:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
import random

class SkipListNode:
    def __init__(self, key, level):
        self.key = key
        self.forward = [None] * (level + 1)

class SkipList:
    def __init__(self, max_level=16, p=0.5):
        self.max_level = max_level
        self.p = p
        self.level = 0
        self.head = SkipListNode(None, max_level)
    
    def _random_level(self):
        level = 0
        while random.random() < self.p and level < self.max_level:
            level += 1
        return level
    
    def search(self, key):
        current = self.head
        
        for level in range(self.level, -1, -1):
            while current.forward[level] and current.forward[level].key < key:
                current = current.forward[level]
        
        current = current.forward[0]
        
        if current and current.key == key:
            return current.key
        return None
    
    def insert(self, key):
        update = [None] * (self.max_level + 1)
        current = self.head
        
        for level in range(self.level, -1, -1):
            while current.forward[level] and current.forward[level].key < key:
                current = current.forward[level]
            update[level] = current
        
        current = current.forward[0]
        
        # If key already exists, do not insert duplicate
        if current and current.key == key:
            return False
        
        new_level = self._random_level()
        
        if new_level > self.level:
            for level in range(self.level + 1, new_level + 1):
                update[level] = self.head
            self.level = new_level
        
        new_node = SkipListNode(key, new_level)
        
        for level in range(new_level + 1):
            new_node.forward[level] = update[level].forward[level]
            update[level].forward[level] = new_node
        
        return True
    
    def delete(self, key):
        update = [None] * (self.max_level + 1)
        current = self.head
        
        for level in range(self.level, -1, -1):
            while current.forward[level] and current.forward[level].key < key:
                current = current.forward[level]
            update[level] = current
        
        target = current.forward[0]
        
        if target and target.key == key:
            for level in range(self.level + 1):
                if update[level].forward[level] != target:
                    break
                update[level].forward[level] = target.forward[level]
            
            while self.level > 0 and self.head.forward[self.level] is None:
                self.level -= 1
            
            return True
        return False
    
    def display(self):
        print("Skip List:")
        for level in range(self.level, -1, -1):
            print(f"Level {level}: ", end="")
            node = self.head.forward[level]
            while node:
                print(f"{node.key} -> ", end="")
                node = node.forward[level]
            print("NIL")

# Usage example
skip_list = SkipList()

# Insert elements
for val in [30, 10, 50, 20, 70, 40, 60]:
    skip_list.insert(val)

skip_list.display()

# Search
print(f"Search 40: {skip_list.search(40)}")
print(f"Search 35: {skip_list.search(35)}")

# Delete
skip_list.delete(40)
print(f"After deleting 40, search 40: {skip_list.search(40)}")

Skip List Optimizations

Choosing the Right Probability

The standard probability is p = 0.5, but you can tune it:

p value Avg pointers per node Search comparisons
0.5 2 2 log₂ n
0.25 1.33 4 log₄ n
0.125 1.14 8 log₈ n

Lower p values use less memory but require more comparisons. For most cases, p = 0.5 or p = 0.25 works well.

Setting Maximum Level

The maximum level should be chosen based on expected data size:

1
max_level = log(n) / log(1/p)

For n = 1 million elements with p = 0.5:

1
max_level = log(1000000) / log(2) ≈ 20

A max_level of 16 to 32 works for most applications.

Unrolled Skip Lists

For better cache performance, store multiple keys per node instead of one. This reduces pointer chasing and improves locality.

Common Interview Questions

Q: Why use skip lists instead of hash tables?

Hash tables give O(1) average lookup but do not maintain sorted order. Skip lists give O(log n) lookup but support range queries, finding minimum/maximum, and iterating in sorted order. Use hash tables for exact key lookup, skip lists when you need ordering.

Q: How do you choose the maximum level?

Use log₂(n) where n is the expected number of elements. In practice, 16 to 32 levels handles billions of elements.

Q: Can skip lists have duplicate keys?

Yes. You can modify insertion to allow duplicates by not checking for existing keys, or by using secondary sorting.

Q: What is the expected space usage?

With p = 0.5, each node has 2 pointers on average (1 + 1/2 + 1/4 + …). Total space is O(n) with approximately 2n pointers.

Q: Why does Redis use skip lists instead of red-black trees?

Redis creator Antirez explained: skip lists are simpler to implement correctly, have comparable performance, and are easier to debug. The simplicity matters for maintaining a complex system like Redis.

Key Takeaways

  1. Skip lists achieve O(log n) search without rotations or rebalancing. They use randomness to maintain balance probabilistically.

  2. The structure is simple. A sorted linked list with multiple levels of forward pointers. Higher levels skip over more nodes.

  3. Level assignment uses coin flips. Each node’s height is determined randomly during insertion. This naturally creates logarithmic height.

  4. Real systems use skip lists. Redis sorted sets, LevelDB MemTable, Java ConcurrentSkipListMap. They are production-proven.

  5. Skip lists excel at concurrent access. Insertions are local, making lock-free implementations practical.

  6. Trade-off: simplicity for guaranteed worst case. O(n) worst case is possible but extremely unlikely. For most applications, this is acceptable.

  7. Implementation is straightforward. About 100 lines of code for a working skip list. Compare that to a correct red-black tree implementation.


Further Reading:

Working with sorted data in production? Skip lists might be the simple solution you need. For in-memory caching, check out Caching Strategies Explained. Building distributed systems? See How Kafka Works for handling high-throughput data streams.