Why Rate Limit?
Without rate limiting, a single misbehaving client — whether a bug, a scraper, or an attacker — can degrade service for everyone. Rate limiting is the first line of defense for API availability.
Fixed Window vs Sliding Window
Fixed window counters reset at clock boundaries, which allows burst traffic at window edges. Sliding window algorithms track requests over a rolling time period, providing smoother control at the cost of more memory.
Token Bucket
Token bucket allows controlled bursts: tokens accumulate at a fixed rate up to a maximum, and each request consumes a token. It's the algorithm behind most production rate limiters including AWS API Gateway.
Redis-Backed Rate Limiting
For distributed systems, in-memory stores don't work — multiple instances have separate counters. Use Redis with atomic Lua scripts or the INCR/EXPIRE pattern for cross-instance coordination.
const key = `ratelimit:${ip}:${endpoint}`
const count = await redis.incr(key)
if (count === 1) await redis.expire(key, windowSeconds)
if (count > limit) return 429
Tiered Limits
Different endpoints deserve different limits. Auth endpoints (login, register) should be far stricter than read endpoints. Write endpoints should be stricter than read endpoints. Implement per-route configuration.
Headers
Always return rate limit headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset. Clients need this information to implement respectful backoff.