Software Engineering Glossary

Fencing Token

Also known as: Epoch Number Generation Number

A fencing token is a number that goes up every time a lease or lock is given out. The resource being protected remembers the highest token it has seen and rejects any request with a smaller one. This stops a frozen holder from waking up after its lease has expired and writing to a resource it no longer owns.

Key Takeaways

  • Fencing tokens fix the worst kind of locking bug. A slow or paused holder cannot write after its lease has been reassigned.
  • The lease service has to hand out tokens that always go up, and the storage layer has to reject smaller tokens. Without both, the token does nothing.
  • Without fencing, one long GC pause can corrupt shared state. Martin Kleppmann’s 2016 post is a good read on this.
  • Tokens, epochs, and generations all mean the same thing in different systems like ZooKeeper, etcd, Kafka, and HDFS.

How It Works

  1. When the lease service grants a lease, it stamps it with a token bigger than any token it has ever issued.
  2. The client sends the token along with every write to the protected resource.
  3. The resource compares the token to the highest one it has accepted. Anything smaller is rejected.
  4. When a new holder takes over, it gets a fresher token. Any late writes from the old holder fail safely.

Where It Is Used

  • Kafka’s controller epoch and producer epoch are fencing tokens. They stop stale leaders and zombie producers from writing bad data.
  • ZooKeeper uses the cversion or czxid of a znode as a natural fencing token for client side locking.
  • HDFS NameNode lease recovery uses generation stamps as fencing tokens to invalidate stale block writes.