Token Bucket Rate Limiter
Watch requests flow through the gate as tokens are consumed and refilled
Controls
Keyboard: Space play/pause · R reset · B burst
Accepted
0
Rejected
0
Tokens
10/10
Throughput
0/sec
What is Rate Limiting?
Rate limiting controls how many requests a client can make to a service. It protects servers from overload, ensures fair resource sharing, prevents abuse, and manages costs.
The Token Bucket Algorithm
Token bucket allows controlled bursts while enforcing an average rate. This fits real traffic patterns well since requests often arrive in bursts.
How it works:
- Bucket starts with capacity tokens (e.g., 10)
- Tokens refill at rate per second (e.g., 5/sec)
- Each request consumes 1 token (or N for weighted requests)
- Request accepted if tokens available, otherwise rejected
- Bucket never exceeds capacity (excess tokens discarded)
Capacity defines burst tolerance, refill rate defines sustained throughput. A bucket with capacity=100 and rate=10/sec allows a burst of 100 requests, then sustains 10/sec.
Why Token Bucket?
Advantages
- • Handles bursty traffic gracefully
- • O(1) time and space per limiter
- • Simple to implement and reason about
- • Easy to tune with two parameters
- • No request queuing needed
Considerations
- • Bursts can overwhelm downstream
- • Doesn't smooth traffic like leaky bucket
- • Requires atomic operations in distributed systems
- • Memory per unique client in multi-tenant
Algorithm Comparison
| Algorithm | Mechanism | Pros | Cons |
|---|---|---|---|
| Token Bucket | Tokens refill over time, consumed per request | Allows bursts, O(1), memory efficient | Bursts may overwhelm downstream |
| Leaky Bucket | Queue requests, process at fixed rate | Smooth output, prevents bursts | Adds latency, queue memory O(n) |
| Fixed Window | Counter resets each time window | Simple, O(1), easy to implement | 2x burst at window edges |
| Sliding Window Log | Store timestamp of each request | Precise, no boundary issues | O(n) time and space |
| Sliding Window Counter | Weighted avg of adjacent windows | O(1), smoother than fixed window | Approximate, not exact count |
Distributed Systems Considerations
In production, rate limiters typically run across multiple nodes. This introduces challenges:
Synchronization Strategy
Use a centralized store (Redis) with Lua scripts for atomic read-modify-write. Accept ~1-5ms latency per request for accuracy.
Failure Handling
Fail open (allow requests) when Redis is unavailable. Use local in-memory fallback with shorter windows to prevent total outage from becoming DoS.
Race Conditions
Lua scripts execute atomically in Redis. For other stores, use optimistic locking with CAS (Compare-And-Swap) operations.
Production Best Practices
- Response:Return 429 Too Many Requests with Retry-After header
- Headers:Include X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset
- Granularity:Layer limits: global → per-tenant → per-user → per-endpoint
- Monitoring:Track rejection rate, p99 latency impact, top rejected clients
- Placement:Apply at edge (API Gateway, CDN) to reject early and protect origin