Rate Limiting Strategies for Production APIs

Why Rate Limit?

Without rate limiting, a single misbehaving client — whether a bug, a scraper, or an attacker — can degrade service for everyone. Rate limiting is the first line of defense for API availability.

Fixed Window vs Sliding Window

Fixed window counters reset at clock boundaries, which allows burst traffic at window edges. Sliding window algorithms track requests over a rolling time period, providing smoother control at the cost of more memory.

Token Bucket

Token bucket allows controlled bursts: tokens accumulate at a fixed rate up to a maximum, and each request consumes a token. It's the algorithm behind most production rate limiters including AWS API Gateway.

Redis-Backed Rate Limiting

For distributed systems, in-memory stores don't work — multiple instances have separate counters. Use Redis with atomic Lua scripts or the INCR/EXPIRE pattern for cross-instance coordination.

const key = `ratelimit:${ip}:${endpoint}`
const count = await redis.incr(key)
if (count === 1) await redis.expire(key, windowSeconds)
if (count > limit) return 429

Tiered Limits

Different endpoints deserve different limits. Auth endpoints (login, register) should be far stricter than read endpoints. Write endpoints should be stricter than read endpoints. Implement per-route configuration.

Headers

Always return rate limit headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset. Clients need this information to implement respectful backoff.