Software Engineering Glossary

High Watermark

Also known as: HWM Commit Index

The high watermark is the largest log offset that has been copied to a quorum of replicas. Anything at or below this point is committed and safe to read. Anything above it is still in flight and may be lost if the leader dies. It is the line between data that readers can trust and data that is still being replicated.

Key Takeaways

  • The high watermark is the boundary between data that is safe to read and data that is still being copied.
  • Leaders move the watermark forward only after a majority quorum has stored the entry.
  • Followers learn the watermark from the leader and use it to control their own reads or local applies.
  • Pair the high watermark with a low watermark to mark the range of log entries that must still be kept for replication or recovery.

How It Works

  1. The leader appends new entries to its log and sends them to followers.
  2. Each follower acknowledges the highest entry it has stored so far.
  3. Once an entry is acknowledged by a majority quorum, the leader moves the high watermark to that entry.
  4. Readers stay at or below the watermark and only see entries that are durable.

Where It Is Used

  • Kafka’s high watermark is the largest offset consumers can read. Anything above it is still inside the in sync replica set’s flight.
  • Raft calls the same idea the commit index. The leader marks entries committed once a majority has stored them.
  • Pulsar, BookKeeper, and HDFS all use a watermark style boundary between durable and in flight data.