📑 Rate Limiter System Design Cheat Sheet

1. Problem Overview

Rate limiting controls the number of client requests sent to a server over a period.
Goals:

Prevent DoS attacks
Reduce server overload
Control API costs
Maintain fair usage

2. Requirements

⏱️ Low latency (should not slow down requests)
🛑 Accurate throttling (hard or soft limits)
🧠 Flexible rules (per-user, per-IP, per-endpoint)
⚡ High fault tolerance (rate limiter failure ≠ system failure)
🌍 Distributed support (works across servers)
📢 User feedback (429 Too Many Requests, headers with retry info)

3. High-Level Design

flowchart LR
    C[Client] --> RL[Rate Limiter Middleware]
    RL -->|Allowed| API[API Server]
    RL -->|Throttled| E[HTTP 429 Response]

Middleware or API Gateway placement.
Uses in-memory stores (Redis) for counters.
Returns headers to clients.

4. Algorithms Deep Dive

4.1 Token Bucket

Tokens refill at constant rate. Each request consumes one.
Pros: Handles bursts; widely used (Stripe, AWS).
Cons: Requires careful parameter tuning.
🔗 AWS API Gateway Throttling
- Summary: AWS API Gateway supports configurable rate limits and burst limits using token bucket algorithm to control traffic.

4.2 Leaky Bucket

FIFO queue; requests processed at constant rate.
Pros: Smooths traffic.
Cons: Old requests can block newer ones.
🔗 Shopify API Limits
- Summary: Shopify enforces leaky-bucket-based limits per app and per store, ensuring fair API usage.

4.3 Fixed Window Counter

Time divided into fixed intervals with counters.
Pros: Simple, low memory.
Cons: Bursts at window edges.
🔗 Twitter API Limits
- Summary: Twitter enforces strict per-15 minute windows for different endpoints; limits vary by resource type.

4.4 Sliding Window Log

Store request timestamps; prune outdated ones.
Pros: Accurate; no burst bypass.
Cons: High memory usage.
🔗 Redis Sorted Set Rate Limiting
- Summary: Implements rolling rate limits with Redis sorted sets to efficiently handle timestamped requests.

4.5 Sliding Window Counter

Weighted average between current & previous window.
Pros: Efficient approximation.
Cons: Assumes uniform distribution.
🔗 Cloudflare Scaling
- Summary: Cloudflare explains approximate sliding window counter, achieving scale for billions of requests.

5. Detailed Architecture

flowchart TD
    Client --> RL[Rate Limiter]
    RL --> Redis[Redis Cache]
    Redis --> RL
    RL --> |Allowed| API[API Server]
    RL --> |429 + Retry Header| Client

Redis used for counters & timestamps.
Config-driven rules, pulled into cache.
Lua scripts / atomic ops handle concurrency.

6. Distributed Environment Challenges

Race condition: solved with Redis Lua scripts or atomic ops.
Synchronization: centralized stores preferred over sticky sessions.
Performance: edge servers reduce latency.
🔗 Lyft Ratelimit OSS
- Summary: Lyft open-sourced their Envoy-based rate limiter, supporting rule configuration and distributed caching.

7. Client Communication

HTTP 429: Too Many Requests.
Headers:
- X-RateLimit-Remaining
- X-RateLimit-Limit
- X-RateLimit-Retry-After

8. Monitoring & Optimization

Gather analytics to tune thresholds.
Adjust for flash sales/bursty traffic.
Support multi-datacenter deployment.
🔗 Google Cloud Rate Limiting Strategies
- Summary: Google describes different strategies (fixed, sliding, token, leaky) and tradeoffs for scalability.

9. Reference Materials with Summaries

Google Cloud: Rate Limiting Strategies — overview of techniques and best practices.
Twitter API Limits — strict resource-based API limits.
Google Docs API Limits — quotas like 300 reads per user per 60s.
Stripe Rate Limiters — practical use of token bucket at scale.
Shopify API Limits — explains per-app and per-store rules.
Redis — in-memory cache enabling atomic counter ops.
Lyft Ratelimit OSS — distributed rate limiter with rule configs.
Cloudflare Scaling — how to scale rate limiting across billions of domains.

If you liked this cheat sheet on Rate Limiting, you may like my cheat sheet for a URL Shortener :)

Ralph Haddad

Explorer

Rate Limiter

📑 Rate Limiter System Design Cheat Sheet

1. Problem Overview

2. Requirements

3. High-Level Design

4. Algorithms Deep Dive

4.1 Token Bucket

4.2 Leaky Bucket

4.3 Fixed Window Counter

4.4 Sliding Window Log

4.5 Sliding Window Counter

5. Detailed Architecture

6. Distributed Environment Challenges

7. Client Communication

8. Monitoring & Optimization

9. Reference Materials with Summaries

Graph View

Table of Contents