Rate Limiting
Controlling how many requests a client can make in a given time window
Overview
Rate limiting caps how frequently a client can call a service within a time window. It protects systems from abuse, accidental request storms, and uneven load, and it enforces fair usage across tenants. When a client exceeds its allowance the server typically responds with HTTP 429 (Too Many Requests) and a Retry-After header.
Syntax / Usage
Most limiters track a counter per client key (API key, user ID, or IP) in a fast shared store like Redis so all servers agree on the count. The token bucket algorithm is the most common: tokens refill at a steady rate, each request consumes one, and requests are rejected when the bucket is empty.
token_bucket(key, capacity=100, refill=10/sec):
now = current_time()
tokens = min(capacity, stored_tokens + (now - last_refill) * refill)
if tokens >= 1:
tokens -= 1
allow request
else:
reject with 429, Retry-After
save tokens, last_refill=now
Other algorithms: fixed window (simple but bursty at boundaries), sliding window log (accurate but memory-heavy), and leaky bucket (smooths output to a constant rate).
Examples
A public API allows 1,000 requests per hour per API key. The gateway increments a Redis counter with a one-hour TTL and returns X-RateLimit-Remaining so clients can self-throttle.
A login endpoint limits 5 attempts per minute per IP to slow brute-force attacks, using a short-window fixed counter.
A payment webhook consumer uses a leaky bucket to process events at a steady 50/sec, preventing downstream databases from being overwhelmed during traffic spikes.
Common Mistakes
- Storing counters in per-server memory, so limits break behind a load balancer
- Using a fixed window, letting clients burst at the window boundary (nearly 2x the intended rate)
- Rate limiting by IP only, punishing users behind shared NATs and proxies
- Returning a generic error instead of 429 with
Retry-After - Applying one global limit instead of tiering limits by endpoint cost or plan
See Also
system-design-apis system-design-load-balancing system-design-caching