Simon Vitt

What is Rate Limiting?

Rate limiting controls how many requests a client can make to a service. It protects servers from overload, ensures fair resource sharing, prevents abuse, and manages costs.

The Token Bucket Algorithm

Token bucket allows controlled bursts while enforcing an average rate. This fits real traffic patterns well since requests often arrive in bursts.

How it works:

Bucket starts with capacity tokens (e.g., 10)
Tokens refill at rate per second (e.g., 5/sec)
Each request consumes 1 token (or N for weighted requests)
Request accepted if tokens available, otherwise rejected
Bucket never exceeds capacity (excess tokens discarded)

Capacity defines burst tolerance, refill rate defines sustained throughput. A bucket with capacity=100 and rate=10/sec allows a burst of 100 requests, then sustains 10/sec.

Why Token Bucket?

Advantages

• Handles bursty traffic gracefully
• O(1) time and space per limiter
• Simple to implement and reason about
• Easy to tune with two parameters
• No request queuing needed

Considerations

• Bursts can overwhelm downstream
• Doesn't smooth traffic like leaky bucket
• Requires atomic operations in distributed systems
• Memory per unique client in multi-tenant

Algorithm Comparison

Algorithm	Mechanism	Pros	Cons
Token Bucket	Tokens refill over time, consumed per request	Allows bursts, O(1), memory efficient	Bursts may overwhelm downstream
Leaky Bucket	Queue requests, process at fixed rate	Smooth output, prevents bursts	Adds latency, queue memory O(n)
Fixed Window	Counter resets each time window	Simple, O(1), easy to implement	2x burst at window edges
Sliding Window Log	Store timestamp of each request	Precise, no boundary issues	O(n) time and space
Sliding Window Counter	Weighted avg of adjacent windows	O(1), smoother than fixed window	Approximate, not exact count

Distributed Systems Considerations

In production, rate limiters typically run across multiple nodes. This introduces challenges:

Synchronization Strategy

Use a centralized store (Redis) with Lua scripts for atomic read-modify-write. Accept ~1-5ms latency per request for accuracy.

Failure Handling

Fail open (allow requests) when Redis is unavailable. Use local in-memory fallback with shorter windows to prevent total outage from becoming DoS.

Race Conditions

Lua scripts execute atomically in Redis. For other stores, use optimistic locking with CAS (Compare-And-Swap) operations.

Production Best Practices

Response:Return 429 Too Many Requests with Retry-After header
Headers:Include X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset
Granularity:Layer limits: global → per-tenant → per-user → per-endpoint
Monitoring:Track rejection rate, p99 latency impact, top rejected clients
Placement:Apply at edge (API Gateway, CDN) to reject early and protect origin

Real-World Examples

AWS API GatewayToken Bucket

StripeToken Bucket

ShopifyLeaky Bucket

Token Bucket Rate Limiter