URL Shortener System Design – Blended Cheat Sheet (ByteByteGo-style)
This cheat sheet blends the common core concepts taught in popular URL shortener system-design walkthroughs (capacity planning, API shape, ID strategies, caching tiers, sharding, and multi‑region considerations) with the practical tactics we discussed (hot-key protection, async analytics, backpressure). It’s structured as: high-level → diagrams → deep dives, with cross-links.
High-Level Architecture
flowchart LR A[Client] --> B[CDN/Edge] B --> C[API Gateway] C --> D{L1+L2 Cache L1 in-process,L2 Redis/Memcached} D -- hit --> E[302 Redirect] D -- miss --> F[(Primary KV/SQL DB)] F --> D E --> G[Browser follows to Long URL] C -. async .-> H[Emitter bounded queue] H --> I[[Kafka / PubSub]] I --> J[Stream Processing<br/>Flink/Streams] J --> K[(Analytics Store<br/>ClickHouse/BigQuery)]
Key ideas (ByteByteGo-style essentials):
-
Clarify requirements: create, redirect, custom alias, optional expiration, analytics, high availability, low latency.
-
Capacity planning (assume to size): target QPS, reads ≫ writes, storage growth, hot-link behavior, multi‑region needs.
-
API surface:
POST /shorten
,GET /:code
, optionalPUT /:code
(update),DELETE /:code
,GET /:code/stats
. -
Data model:
code (PK)
,long_url
,created_at
,owner_id
,ttl/expiry
, optionalchecksum
,flags
.
Jump to: ID Generation, Redirect Flow, Caching, Sharding, Hot Keys, Analytics & Backpressure.
URL Shortening – Deep Dive
sequenceDiagram participant Client participant API as API Service participant ID as ID Generator participant DB as DB (KV/SQL) participant Cache as L2 Cache (Redis) Client->>API: POST /shorten {long_url, alias?, ttl?} API->>API: validate/normalize URL, policy checks API->>ID: generate code (Counter/Snowflake/Random/KGS) ID-->>API: code API->>DB: INSERT (code, long_url, meta) UNIQUE(code) DB-->>API: OK (retry on conflict if random) API->>Cache: SET s:code -> long_url (TTL per policy) API-->>Client: 201 {short_url}
Notes:
-
Validate/normalize (http→https? trailing slashes? UTM handling).
-
Enforce domain allow/deny lists, phishing/malware scans (async if needed).
-
Dedup optional (checksum of long_url ⇒ “return existing code”).
Unique ID Generation
-
Base62/Base64 = encoding, not uniqueness.
-
Strategies:
-
Counter + encode: short, predictable, no collisions.
-
Snowflake IDs: distributed, collision-free, ~11 chars.
-
Random tokens: 72–96 bits + unique index, retried on collision.
-
KGS: pre-mint short codes, lease atomically.
-
-
Deep Dive:
-
Counter+encode → simplest, but predictable.
-
Snowflake → distributed, safe at scale.
-
Random+unique index → best for privacy/security.
-
KGS → ensures short vanity codes stay unique.
-
Redirect Flow
flowchart LR A[GET /:code] --> B{L1 Cache?} B -- hit --> C[302 with Location] B -- miss --> D{L2 Cache?} D -- hit --> C D -- miss --> E[(DB Lookup by code)] E --> D C --> F[Browser → Long URL] A -. non-blocking .-> G[Emit analytics event]
302 vs 301: Prefer 302 (temporary) so you can change/expire links later. Control CDN/browser caching via Cache-Control
headers.
Hash + Collision Resolution
flowchart TB subgraph Options X1[Counter + Base62] -->|"No collisions<br/>(shortest codes)"| Y1[Pros] X2[Snowflake → Base62] -->|"Distributed, unique"| Y2[Pros] X3[Random k-bit → Base62/64] -->|"Check-and-insert<br/>retry on conflict"| Y3[Pros] X4["KGS (pre-minted pool)"] -->|"Guaranteed unique<br/>for very short slugs"| Y4[Pros] end subgraph Collision Handling R1["Random token<br/>INSERT with UNIQUE(code)"] R1 -->|on conflict| R2[Regenerate & retry] R1 -->|success| R3[Done] end
Guidance:
-
Base62/Base64 are encodings (≈6 bits/char), not uniqueness.
-
For random tokens: choose ≥96 bits to make collisions negligible at multi‑billion scale; enforce
UNIQUE(code)
and retry on the rare conflict. -
For very short vanity codes (6–8 chars), use KGS (Key Generation Service) that leases unused codes atomically.
-
Snowflake avoids coordination; monotonic → add salting to avoid range hotspots.
Caching Strategy
flowchart LR A[Client] --> B[CDN/Edge Cache] B --> C[API] C --> D{"L1 (process) cache?"} D -- hit --> E[302] D -- miss --> F{"L2 (Redis) cache?"} F -- hit --> E F -- miss --> G[(DB)] G --> F style E fill:#fff,stroke:#333
Practices:
-
Negative caching for “not found” (short TTL) to deflect brute-force scans.
-
Stampede protection: per-key mutex or stale‑while‑revalidate.
-
Hot key protection: replicate hot keys (N buckets), jittered TTLs, tiny L1 TTL (100–500ms).
-
TTL policy: per-code TTL if links expire; otherwise long-lived cache entries.
Storage & Sharding
flowchart LR H[Router] --> I{"Consistent Hash(code)"} I --> J1["(Shard 1<br/>Primary + Replicas)"] I --> J2["(Shard 2<br/>Primary + Replicas)"] I --> J3["(Shard 3<br/>Primary + Replicas)"]
Approaches:
-
Hash-based sharding on
code
(uniform distribution for Snowflake/random/KGS). -
Consistent hashing eases shard add/remove with minimal rebalancing.
-
For monotonic IDs (counter/Snowflake), salt the key to avoid last-shard hotspots.
-
Replication (e.g., RF=3) for HA. Reads from replicas; writes to primary or quorum (KV).
Scaling & Hot Key Protection
flowchart TB subgraph Hot Link Path A[Viral code] --> B[CDN 30-120s TTL] B --> C[L1 micro-TTL] C --> D[L2 replicated keys<br/>s:code#0..N-1] D --> E[Request coalescing<br/>+ SWR] end
Tactics: CDN first; L1 micro‑cache; replicate hot keys; per-key locks; stale‑while‑revalidate; adaptive TTL/jitter; rate limits per IP/ASN for abuse waves.
Analytics & Backpressure
sequenceDiagram participant API as Redirect Handler participant LQ as Bounded Ring Buffer (Emitter) participant AG as Local Agent/Producer participant MQ as Kafka/PubSub API->>API: Resolve code → 302 (fast) API->>LQ: try_push(event) (non-blocking) Note over API,LQ: If full → drop/sampled LQ->>AG: batch flush AG->>MQ: produce acks=1 (normal mode) AG->>MQ: produce acks=0 when stalled
Backpressure plan:
-
Sample before enqueue (Bernoulli/stratified).
-
Bounded, non-blocking emitter queue; drop-if-full with metrics.
-
Adaptive modes: NORMAL → DEGRADED → STALLED (e.g., keep 10%).
-
Use lite payload under pressure; never block 302.
Capacity Planning (ByteByteGo-style checklist)
-
Assumptions: peak RPS (reads, writes), growth rate, link lifetime, % hot vs long tail.
-
Storage: rows per day × bytes/row (code + URL + metadata) × retention.
-
Throughput: cache hit ratio target (e.g., >95%); DB QPS budget on misses.
-
Latency goals: origin p99 < 20 ms; cache p99 < 5 ms; creation p99 < 50 ms.
-
Availability: multi-AZ, multi-region read-anywhere; RPO/RTO targets.
API Reference (sketch)
POST /v1/shorten
{ "long_url": "...", "alias": "optional", "ttl_days": 30 }
GET /v1/r/{code}
→ 302 Location: <long_url>
PUT /v1/links/{code}
{ "long_url": "...", "ttl_days": 60 }
DELETE /v1/links/{code}
GET /v1/links/{code}/stats?window=1d
Trade-offs Summary
-
Counter vs Snowflake vs Random vs KGS: shortest vs distributed vs private vs guaranteed-short.
-
302 vs 301: flexibility vs aggressive client caching.
-
KV vs SQL: simple scale vs richer queries/constraints.
-
Edge analytics vs origin: lowest latency vs simplicity.
Deep Dives
ID Generation (Details)
flowchart TB A[Need unique code] --> B{Strategy} B --> C[Counter + Base62] B --> D[Snowflake → Base62] B --> E[Random ≥96 bits] B --> F[KGS pre-mint] E --> G{"INSERT UNIQUE(code)"} G -->|conflict| E G -->|ok| H[Done]
-
Base62 alphabet avoids
+/=
and is URL-friendly. -
Random ≥96 bits ⇒ ~16 base62 chars, collision risk negligible; still keep UNIQUE and retry.
-
Snowflake: timestamp + machine + sequence.
-
KGS: pool refilled in background; issue via atomic flip from
available→used
.
Caching (Details)
-
Negative cache “not found” for 30–60s.
-
Stampede control via per-key lock (Redis
SETNX
) or SWR. -
Hot keys: replicate N copies; client hashes to pick copy; micro L1 to flatten spikes.
Sharding (Details)
-
Consistent hashing ring maps
hash(code)
→ shard. Small movement on reshard. -
Replication: quorum reads/writes for HA (KV); primary/replica in SQL.
-
Migrations: dual-write during reshard; shadow reads to validate.
Hot Key Protection (Details)
-
CDN with cautious TTL to retain control (use 302).
-
Jittered TTLs so replicas don’t expire simultaneously.
-
Adaptive promotion: pin hot keys in memory during spikes.
Analytics & Backpressure (Details)
-
Emitter-side sampling + bounded ring buffer ensures no blocking.
-
Modes:
NORMAL(100%)
,DEGRADED(50–80%)
,STALLED(10%)
,RECOVERY(ramp-up)
. -
Metrics: queue depth, producer p99, send error rate; enforce floors per tenant (stratified sampling).
-
Uniqueness/HLL: HyperLogLog for unique visitors; windowed aggregates for dashboards.
Quick Interview Talk Track (2–3 minutes)
- “Reads dominate. I front with a CDN and use 302 so I can change/expire links. Redirect path: L1 → L2 → DB with negative caching and stampede control. For IDs, I’d choose Snowflake→base62 or random‑96b + UNIQUE; KGS if we need short vanity codes. Storage is sharded by consistent hash(code) with replication. Hot links get replicated cache keys and micro L1 TTL to avoid hotspots. Analytics is strictly async with emitter‑side sampling, bounded queues, and p99‑driven shedding, so redirect p99 stays < 20ms even under spikes.”