stackademic

The leading education platform for anyone with an interest in software development.

Rate Limiting

Controlling how many requests a client can make in a given time window

Overview

Rate limiting caps how frequently a client can call a service within a time window. It protects systems from abuse, accidental request storms, and uneven load, and it enforces fair usage across tenants. When a client exceeds its allowance the server typically responds with HTTP 429 (Too Many Requests) and a Retry-After header.

Syntax / Usage

Most limiters track a counter per client key (API key, user ID, or IP) in a fast shared store like Redis so all servers agree on the count. The token bucket algorithm is the most common: tokens refill at a steady rate, each request consumes one, and requests are rejected when the bucket is empty.

token_bucket(key, capacity=100, refill=10/sec):
    now = current_time()
    tokens = min(capacity, stored_tokens + (now - last_refill) * refill)
    if tokens >= 1:
        tokens -= 1
        allow request
    else:
        reject with 429, Retry-After
    save tokens, last_refill=now

Other algorithms: fixed window (simple but bursty at boundaries), sliding window log (accurate but memory-heavy), and leaky bucket (smooths output to a constant rate).

Examples

A public API allows 1,000 requests per hour per API key. The gateway increments a Redis counter with a one-hour TTL and returns X-RateLimit-Remaining so clients can self-throttle.

A login endpoint limits 5 attempts per minute per IP to slow brute-force attacks, using a short-window fixed counter.

A payment webhook consumer uses a leaky bucket to process events at a steady 50/sec, preventing downstream databases from being overwhelmed during traffic spikes.

Common Mistakes

  • Storing counters in per-server memory, so limits break behind a load balancer
  • Using a fixed window, letting clients burst at the window boundary (nearly 2x the intended rate)
  • Rate limiting by IP only, punishing users behind shared NATs and proxies
  • Returning a generic error instead of 429 with Retry-After
  • Applying one global limit instead of tiering limits by endpoint cost or plan

See Also

system-design-apis system-design-load-balancing system-design-caching