Software Engineering Glossary

Heartbeat

Also known as: Liveness Probe Keep Alive

A heartbeat is a small message a node sends every so often to say it is still alive. If the heartbeats stop, other nodes assume it has failed and react by expiring leases, picking a new leader, or moving traffic away. Heartbeats are the simplest way to turn the idea of liveness into something a system can act on.

Key Takeaways

  • Heartbeats are the cheapest way to ask if a node is still alive without coordinating with anyone else.
  • The heartbeat interval and the timeout decide how fast you recover. Shorter is faster but noisier and more error prone on a slow network.
  • Heartbeats power lease renewal, the steady state in Raft, gossip style failure detection, and load balancer health checks.
  • Heartbeats can lie. A node can keep sending them while making no real progress. If correctness matters, also check that work is moving forward.

How It Works

  1. Each node sends a small message every 100ms to a few seconds to its peers, a coordinator, or a control plane.
  2. Each receiver tracks the last heartbeat time per peer.
  3. If the gap goes over the timeout, usually 3 to 10 times the interval, the peer is marked suspect or dead.
  4. Other things kick in next, like lease expiry, failover, leader election, or rebalancing.

Where It Is Used

  • Kubernetes kubelets renew node heartbeats through the Lease API to prove the node is alive.
  • Raft leaders send AppendEntries heartbeats to stop followers from starting new elections.
  • Cassandra and DynamoDB use gossip heartbeats so every node has an eventually consistent view of who is in the cluster.