Software Engineering Glossary

Rate Limiting

Also known as: Rate Limiter Throttling

Definition

Rate limiting caps how many requests a client can make in a window of time, protecting a service from overload, abuse, and runaway costs. The common algorithms are the token bucket, which allows short bursts and refills steadily, and the leaky bucket, which enforces a constant output rate. When a client goes over the limit the server returns HTTP 429 with Retry-After so a well-behaved client can back off.

Key Takeaways

Rate limiting protects a backend from being swamped by one noisy client, a retry storm, or a denial-of-service attempt.
Token bucket allows bursts and refills steadily; leaky bucket smooths traffic to a constant rate.
In a distributed setup the counter must be shared, usually in Redis with INCR plus a TTL or an atomic Lua script.
Return HTTP 429 with X-RateLimit-* headers and Retry-After so clients can throttle themselves instead of hammering you.

How It Works

Choose the limit and window, for example 100 requests per minute per API key.
On each request, atomically increment the client’s counter in a shared store and check it against the limit.
If the request is under the limit, allow it; otherwise reject with HTTP 429 and a Retry-After hint.
Counters expire at the end of the window (token bucket refills over time) so the budget resets.

Where It Is Used

Cloudflare, Stripe, and GitHub all expose rate limits with X-RateLimit headers on their APIs.
NGINX and Envoy ship built-in rate limiting at the proxy layer.
Redis is the standard backing store for distributed limiters because INCR and EXPIRE are atomic and fast.

Rate Limiting

Definition

Key Takeaways

How It Works

Where It Is Used

Related glossary terms

Thundering Herd

Circuit Breaker

Caching

Message Queue