Software Engineering Glossary

Load Balancing

Also known as: Load Balancer Traffic Distribution

Definition

Load balancing spreads incoming requests across a pool of servers so no single machine gets overwhelmed. A load balancer sits in front of the servers, checks which ones are healthy, and routes each request to one of them. This is how a service scales horizontally and stays available when a server dies.

Key Takeaways

A load balancer turns many servers into one address. Clients talk to it, and it decides which backend actually handles the request.
Layer 4 balancing routes by IP and port and is fast. Layer 7 balancing reads the request and can route by URL, header, or cookie.
Health checks and heartbeats let the balancer stop sending traffic to a dead or slow server.
Common algorithms are round robin, least connections, and hashing. Hashing with consistent hashing keeps a client on the same server for sticky sessions.

How It Works

Clients send requests to the load balancer’s address instead of any single server.
The balancer keeps a list of healthy backends, updated by periodic health checks.
For each request it picks a backend using an algorithm like round robin or least connections.
If a backend fails a health check, it is dropped from the pool until it recovers, so users never hit it.

Where It Is Used

AWS Elastic Load Balancing, Google Cloud Load Balancing, and Azure Load Balancer run this at cloud scale.
NGINX and HAProxy are widely used software load balancers, often paired with rate limiting.
Cloudflare and other CDNs balance traffic across data centers using anycast and DNS.

Load Balancing

Definition

Key Takeaways

How It Works

Where It Is Used

Related glossary terms

Rate Limiting

Consistent Hashing

Circuit Breaker

Microservices

Heartbeat

Content Delivery Network (CDN)

DDoS Attack