Load Balancing
Definition
Load balancing spreads incoming requests across a pool of servers so no single machine gets overwhelmed. A load balancer sits in front of the servers, checks which ones are healthy, and routes each request to one of them. This is how a service scales horizontally and stays available when a server dies.
Key Takeaways
- A load balancer turns many servers into one address. Clients talk to it, and it decides which backend actually handles the request.
- Layer 4 balancing routes by IP and port and is fast. Layer 7 balancing reads the request and can route by URL, header, or cookie.
- Health checks and heartbeats let the balancer stop sending traffic to a dead or slow server.
- Common algorithms are round robin, least connections, and hashing. Hashing with consistent hashing keeps a client on the same server for sticky sessions.
How It Works
- Clients send requests to the load balancer’s address instead of any single server.
- The balancer keeps a list of healthy backends, updated by periodic health checks.
- For each request it picks a backend using an algorithm like round robin or least connections.
- If a backend fails a health check, it is dropped from the pool until it recovers, so users never hit it.
Where It Is Used
- AWS Elastic Load Balancing, Google Cloud Load Balancing, and Azure Load Balancer run this at cloud scale.
- NGINX and HAProxy are widely used software load balancers, often paired with rate limiting.
- Cloudflare and other CDNs balance traffic across data centers using anycast and DNS.