🎯 Design URL Shortener
1️⃣ Core Framework
When discussing URL Shortener design, I frame it as:
- API design and core user flows
- Short code generation strategy
- Storage and data modeling
- Redirect path optimization
- Trade-offs: uniqueness vs latency vs availability
- Scaling, caching, analytics, and abuse prevention
2️⃣ Core Requirements
Functional Requirements
- Create a short URL from a long URL
- Redirect users from short URL to original URL
- Support custom aliases
- Support expiration time
- Track basic analytics
Non-functional Requirements
- Very low redirect latency
- High availability
- Globally scalable reads
- Strong uniqueness for short codes
- Abuse prevention for malicious links
👉 Interview Answer
A URL shortener has two main flows: creating a short URL and redirecting users to the original URL.
The redirect path is much more read-heavy, so I would optimize it for low latency and high availability.
I would also consider uniqueness, expiration, analytics, and abuse prevention as important production-level requirements.
3️⃣ API Design
Create Short URL
POST /api/urls
Request:
{
"longUrl": "https://example.com/some/very/long/path",
"customAlias": "my-link",
"expiresAt": "2026-12-31T00:00:00Z"
}
Response:
{
"shortUrl": "https://short.ly/abc123",
"shortCode": "abc123"
}
Redirect
GET /{shortCode}
Behavior:
- Look up short code
- Validate expiration
- Redirect to long URL using 301 or 302
Analytics
GET /api/urls/{shortCode}/stats
👉 Interview Answer
I would expose a write API for creating short URLs and a read API for redirecting users.
The redirect API should be extremely lightweight, because it is the critical path and usually has much higher traffic.
4️⃣ Short Code Generation
Option 1: Hash Long URL
Example:
hash(longUrl) → shortCode
Pros:
- Simple
- Same long URL can generate same short code
Cons:
- Collision possible
- Harder to support custom aliases
- Hash output may be too long
Option 2: Random Code
Example:
random base62 string → abc123
Pros:
- Easy to generate
- Good distribution
Cons:
- Need collision check
- Retry required on conflict
Option 3: Auto-increment ID + Base62
Example:
ID = 125000
Base62(ID) = xY9a
Pros:
- Guaranteed uniqueness
- Short and compact
- Easy to reason about
Cons:
- Centralized ID generation can become a bottleneck
- Predictable codes unless obfuscated
Recommended Approach
Use:
Distributed ID Generator → Base62 Encode → shortCode
Examples:
- Snowflake ID
- Database sequence with range allocation
- ID generation service
👉 Interview Answer
I would use a distributed ID generator and encode the generated ID using Base62.
This gives us uniqueness, compact short codes, and avoids repeated collision checks.
If custom aliases are supported, I would enforce uniqueness through a database constraint.
Core Insight
The hardest part of short code generation is not encoding — it is guaranteeing uniqueness at scale.
5️⃣ Data Model
URL Mapping Table
url_mapping (
short_code VARCHAR PRIMARY KEY,
long_url TEXT NOT NULL,
user_id VARCHAR,
created_at TIMESTAMP,
expires_at TIMESTAMP,
status VARCHAR
)
Analytics Table
url_click_event (
event_id VARCHAR PRIMARY KEY,
short_code VARCHAR,
clicked_at TIMESTAMP,
user_agent TEXT,
ip_hash VARCHAR,
country VARCHAR,
referrer TEXT
)
Why Separate Mapping and Analytics?
- Mapping table is latency-sensitive
- Analytics is write-heavy
- Analytics can be async
- Avoid slowing down redirect path
👉 Interview Answer
I would separate the URL mapping table from analytics events.
The mapping table serves the redirect path and must be optimized for low latency.
Analytics events can be written asynchronously so they do not affect user-facing redirect performance.
6️⃣ Redirect Flow
Basic Flow
- User visits short URL
- Load balancer routes request
- Redirect service extracts short code
- Check cache
- If cache miss, query database
- Validate status and expiration
- Return HTTP redirect
301 vs 302
| Redirect Type | Meaning | Use Case |
|---|---|---|
| 301 | Permanent redirect | Better for static links |
| 302 | Temporary redirect | Better for analytics/control |
Recommended
Use 302 by default.
Why?
- Easier to track analytics
- Allows changing destination URL
- Avoids browser/client caching problems
👉 Interview Answer
I would use 302 redirects by default, because they give us more control over analytics, expiration, and destination changes.
If we use 301, browsers and clients may cache the redirect, making future changes harder.
7️⃣ Caching Strategy
Cache What?
shortCode → longUrl
Cache Layers
- CDN / edge cache
- Redis / Memcached
- Local in-memory cache for hot links
Cache Challenges
- Expired links
- Updated destination URLs
- Abuse blocking
- Cache invalidation
Recommended Strategy
- Cache active mappings
- Use TTL aligned with expiration
- Invalidate cache on update/delete/block
- Use negative caching for missing short codes
👉 Interview Answer
Since redirects are read-heavy, caching is one of the most important optimizations.
I would cache shortCode-to-longUrl mappings in Redis and optionally at the edge for very hot links.
However, I need careful TTL and invalidation logic to handle expiration, updates, and abuse blocking.
8️⃣ Trade-offs
Uniqueness vs Simplicity
- Random code is simple but needs collision handling
- ID-based code is more predictable but guarantees uniqueness
Latency vs Analytics Accuracy
- Synchronous analytics is accurate but slower
- Async analytics is faster but may lose some events
Availability vs Consistency
- Redirect path should favor availability
- Creation path needs stronger consistency for uniqueness
Custom Alias vs Collision Risk
- Custom alias improves UX
- But requires uniqueness checks and reserved words
👉 Interview Answer
The main trade-offs are around uniqueness, latency, and availability.
For URL creation, I need strong uniqueness guarantees. For redirects, I prioritize low latency and high availability.
Analytics should usually be asynchronous because it should not slow down the redirect path.
9️⃣ Scaling Patterns
Pattern 1: Read-heavy Optimization
Redirect traffic is much higher than creation traffic.
Use:
- Cache
- CDN
- Read replicas
- Global routing
Pattern 2: Distributed ID Generation
Avoid single database bottleneck.
Use:
- Snowflake-style IDs
- Segment allocation
- Dedicated ID service
Pattern 3: Async Analytics
Redirect path:
redirect request → return redirect → publish click event async
Analytics pipeline:
Kafka / Queue → Stream Processing → Analytics DB
Pattern 4: Database Sharding
Shard by:
- short_code hash
- user_id
- creation time for analytics
Pattern 5: Multi-region Deployment
For global scale:
- Deploy redirect service near users
- Replicate URL mappings
- Use eventual consistency for reads
- Use single-region or strongly coordinated writes for creation
👉 Interview Answer
At scale, I would optimize the read-heavy redirect path first.
I would use caching, distributed ID generation, asynchronous analytics, and database sharding.
For global traffic, I would deploy redirect services in multiple regions and replicate URL mappings close to users.
🔟 Failure Handling & Edge Cases
Common Failures
- Short code not found
- Expired URL
- Database unavailable
- Cache stale
- Analytics pipeline down
- Malicious URL detected
Strategies
- Return 404 for missing code
- Return 410 for expired link
- Serve from cache during DB failure
- Retry analytics asynchronously
- Block unsafe URLs
- Rate limit suspicious users
👉 Interview Answer
The redirect path should degrade gracefully.
If analytics is down, redirects should still work. If the database is temporarily unavailable, we may still serve hot links from cache.
For missing or expired URLs, we return clear error responses like 404 or 410.
1️⃣1️⃣ Security & Abuse Prevention
Risks
- Phishing links
- Malware URLs
- Spam generation
- Brute-force scanning of short codes
Protection
- Rate limiting
- CAPTCHA for suspicious users
- Safe browsing checks
- Domain blocklist
- Reserved aliases
- Non-sequential or obfuscated codes
👉 Interview Answer
URL shorteners are often abused for phishing and spam, so security is a core part of the design.
I would add rate limiting, malicious URL detection, domain blocklists, and monitoring for suspicious traffic patterns.
1️⃣2️⃣ End-to-End Flow
Create Flow
- User submits long URL
- Validate URL
- Check abuse rules
- Generate unique ID
- Encode ID to Base62 short code
- Save mapping
- Return short URL
Redirect Flow
- User opens short URL
- Extract short code
- Check cache
- Query DB on cache miss
- Validate expiration/status
- Emit analytics event async
- Return 302 redirect
Key Insight
URL shortener is simple on the surface, but the real design challenge is building a low-latency, highly available redirect system.
🧠 Staff-Level Answer (Final)
👉 Interview Answer (Full Version)
When designing a URL shortener, I think of it as two main flows: URL creation and URL redirection.
The creation flow needs to generate globally unique short codes, usually by using a distributed ID generator and encoding the ID with Base62.
The redirect flow is much more read-heavy, so I would optimize it with caching, read replicas, and potentially edge deployment.
I would store the core shortCode-to-longUrl mapping separately from analytics data, because redirects must stay low-latency while analytics can be processed asynchronously.
The main trade-offs are uniqueness, latency, availability, freshness, and analytics accuracy.
At scale, I would use distributed ID generation, cache hot links, shard the database, and send click events through an async pipeline.
Ultimately, the goal is to provide fast and reliable redirects while maintaining uniqueness, security, and observability.
⭐ Final Insight
A URL shortener is not just about making URLs shorter — it is about building a highly available redirect system with globally unique identifiers.
中文部分
🎯 Design URL Shortener
1️⃣ 核心框架
在设计 URL Shortener 时,我通常从以下几个方面来分析:
- API 设计和核心用户流程
- 短码生成策略
- 存储设计和数据建模
- Redirect 路径优化
- 核心权衡:唯一性 vs 延迟 vs 可用性
- 扩展、缓存、统计分析和安全防护
2️⃣ 核心需求
功能需求
- 将长 URL 转换成短 URL
- 用户访问短 URL 时跳转到原始 URL
- 支持自定义 alias
- 支持过期时间
- 支持基本点击统计
非功能需求
- Redirect 延迟非常低
- 高可用
- 支持全球范围读流量
- 短码必须唯一
- 防止恶意链接和垃圾链接滥用
👉 面试回答
URL Shortener 主要有两个核心流程: 创建短链接和访问短链接进行跳转。
其中 redirect 路径通常是读多写少, 所以需要重点优化低延迟和高可用。
同时,我也会考虑短码唯一性、过期时间、点击统计 以及恶意链接防护等生产级需求。
3️⃣ API 设计
创建短链接
POST /api/urls
Request:
{
"longUrl": "https://example.com/some/very/long/path",
"customAlias": "my-link",
"expiresAt": "2026-12-31T00:00:00Z"
}
Response:
{
"shortUrl": "https://short.ly/abc123",
"shortCode": "abc123"
}
Redirect
GET /{shortCode}
行为:
- 查找 short code
- 检查是否过期
- 使用 301 或 302 跳转到 long URL
Analytics
GET /api/urls/{shortCode}/stats
👉 面试回答
我会提供一个写 API 用于创建短链接, 以及一个读 API 用于短链接跳转。
Redirect API 必须非常轻量, 因为它是系统的核心路径, 通常也是流量最高的路径。
4️⃣ 短码生成策略
方案 1:Hash Long URL
示例:
hash(longUrl) → shortCode
优点:
- 实现简单
- 相同 long URL 可以生成相同 short code
缺点:
- 可能发生冲突
- 不容易支持自定义 alias
- Hash 结果可能过长
方案 2:随机码
示例:
random base62 string → abc123
优点:
- 生成简单
- 分布比较均匀
缺点:
- 需要检查冲突
- 冲突时需要重试
方案 3:自增 ID + Base62
示例:
ID = 125000
Base62(ID) = xY9a
优点:
- 可以保证唯一性
- 短码紧凑
- 容易理解和实现
缺点:
- 中心化 ID 生成可能成为瓶颈
- 如果不做处理,短码可能可预测
推荐方案
使用:
Distributed ID Generator → Base62 Encode → shortCode
例如:
- Snowflake ID
- 数据库 sequence + range allocation
- 专门的 ID generation service
👉 面试回答
我会使用分布式 ID 生成器, 然后将生成的 ID 通过 Base62 编码成 short code。
这样可以保证唯一性, 同时生成较短且紧凑的短码, 也避免频繁的冲突检查。
如果支持自定义 alias, 我会通过数据库唯一约束来保证 alias 不重复。
核心理解
短码生成最难的不是编码, 而是在大规模下保证唯一性。
5️⃣ 数据模型
URL Mapping Table
url_mapping (
short_code VARCHAR PRIMARY KEY,
long_url TEXT NOT NULL,
user_id VARCHAR,
created_at TIMESTAMP,
expires_at TIMESTAMP,
status VARCHAR
)
Analytics Table
url_click_event (
event_id VARCHAR PRIMARY KEY,
short_code VARCHAR,
clicked_at TIMESTAMP,
user_agent TEXT,
ip_hash VARCHAR,
country VARCHAR,
referrer TEXT
)
为什么 Mapping 和 Analytics 要分开?
- Mapping 表服务 redirect 路径,延迟敏感
- Analytics 是写入量很大的事件数据
- Analytics 可以异步处理
- 避免统计逻辑拖慢 redirect
👉 面试回答
我会将 URL mapping 表和 analytics event 表分开。
Mapping 表用于 redirect 路径, 必须针对低延迟查询进行优化。
Analytics 事件可以异步写入和处理, 这样不会影响用户访问短链接的性能。
6️⃣ Redirect 流程
基本流程
- 用户访问短链接
- Load balancer 路由请求
- Redirect service 提取 short code
- 查询 cache
- Cache miss 时查询数据库
- 检查状态和过期时间
- 返回 HTTP redirect
301 vs 302
| Redirect 类型 | 含义 | 使用场景 |
|---|---|---|
| 301 | 永久跳转 | 静态、不变的链接 |
| 302 | 临时跳转 | 需要统计和控制的链接 |
推荐
默认使用 302。
原因:
- 更容易做点击统计
- 方便未来修改目标 URL
- 避免浏览器或客户端永久缓存 redirect
👉 面试回答
我会默认使用 302 redirect, 因为它给系统更多控制能力, 方便做 analytics、过期控制和目标 URL 修改。
如果使用 301,浏览器或客户端可能会缓存跳转结果, 导致后续修改变得困难。
7️⃣ 缓存策略
缓存什么?
shortCode → longUrl
缓存层
- CDN / Edge cache
- Redis / Memcached
- 本地内存缓存,用于热点链接
缓存挑战
- 链接过期
- 目标 URL 更新
- 恶意链接封禁
- 缓存失效
推荐策略
- 缓存 active mappings
- TTL 和过期时间对齐
- 更新、删除、封禁时主动失效缓存
- 对不存在的 short code 做 negative caching
👉 面试回答
因为 redirect 是读多写少, 缓存是最重要的优化之一。
我会将 shortCode 到 longUrl 的映射缓存在 Redis 中, 对于特别热门的链接,也可以放到 edge 层。
但是需要谨慎处理 TTL 和缓存失效, 以支持过期、更新和恶意链接封禁。
8️⃣ 核心权衡
唯一性 vs 简单性
- 随机码实现简单,但需要冲突处理
- ID-based 方案更可预测,但可以保证唯一性
延迟 vs 统计准确性
- 同步写 analytics 更准确,但会拖慢 redirect
- 异步写 analytics 更快,但可能丢少量事件
可用性 vs 一致性
- Redirect 路径优先高可用
- 创建短链接路径需要更强一致性来保证唯一性
Custom Alias vs Collision Risk
- 自定义 alias 用户体验好
- 但需要唯一性检查和保留词检查
👉 面试回答
URL Shortener 的核心权衡主要是唯一性、延迟和可用性。
创建短链接时,我需要更强的一致性来保证 short code 唯一。 访问短链接时,我会优先保证低延迟和高可用。
Analytics 通常应该异步处理, 因为它不应该拖慢 redirect 路径。
9️⃣ 扩展模式
Pattern 1: Read-heavy Optimization
Redirect 流量远高于创建流量。
可以使用:
- Cache
- CDN
- Read replicas
- Global routing
Pattern 2: Distributed ID Generation
避免单个数据库成为瓶颈。
可以使用:
- Snowflake-style IDs
- Segment allocation
- Dedicated ID service
Pattern 3: Async Analytics
Redirect 路径:
redirect request → return redirect → publish click event async
Analytics Pipeline:
Kafka / Queue → Stream Processing → Analytics DB
Pattern 4: Database Sharding
分片方式:
- short_code hash
- user_id
- analytics 按时间分片
Pattern 5: Multi-region Deployment
全球化场景:
- Redirect service 部署到靠近用户的区域
- URL mappings 跨区域复制
- 读路径可以接受 eventual consistency
- 写路径使用单区域或强协调写入
👉 面试回答
在大规模场景下,我会优先优化 read-heavy 的 redirect 路径。
我会使用缓存、分布式 ID 生成、异步 analytics 和数据库分片。
对于全球流量, 我会将 redirect service 部署到多个 region, 并将 URL mapping 复制到靠近用户的位置。
🔟 故障处理与边界情况
常见故障
- Short code 不存在
- URL 已过期
- 数据库不可用
- 缓存数据过期或不一致
- Analytics pipeline 故障
- 检测到恶意 URL
处理策略
- short code 不存在返回 404
- URL 过期返回 410
- 数据库不可用时,热点链接可从 cache 返回
- Analytics 异步重试
- 封禁恶意 URL
- 对可疑用户限流
👉 面试回答
Redirect 路径应该具备优雅降级能力。
如果 analytics 系统故障,redirect 仍然应该正常工作。 如果数据库短暂不可用, 热门链接可以暂时通过缓存继续服务。
对于不存在或过期的链接, 系统应该返回清晰的错误状态,比如 404 或 410。
1️⃣1️⃣ 安全与滥用防护
风险
- 钓鱼链接
- 恶意软件链接
- 垃圾链接批量生成
- 暴力扫描 short code
防护方式
- Rate limiting
- 对可疑用户使用 CAPTCHA
- Safe browsing 检查
- Domain blocklist
- Reserved aliases
- 使用不可预测或混淆后的短码
👉 面试回答
URL Shortener 很容易被用于钓鱼和垃圾链接传播, 所以安全防护是设计中的重要部分。
我会加入限流、恶意 URL 检测、域名黑名单, 以及对异常访问模式的监控。
1️⃣2️⃣ End-to-End Flow
Create Flow
- 用户提交 long URL
- 校验 URL 格式
- 检查安全规则
- 生成唯一 ID
- 将 ID 编码成 Base62 short code
- 保存 mapping
- 返回 short URL
Redirect Flow
- 用户打开 short URL
- 提取 short code
- 查询 cache
- Cache miss 时查询数据库
- 检查过期时间和状态
- 异步发送 analytics event
- 返回 302 redirect
Key Insight
URL Shortener 表面上很简单, 但真正的设计挑战是构建一个低延迟、高可用的 redirect 系统。
🧠 Staff-Level Answer(最终版)
👉 面试回答(完整背诵版)
在设计 URL Shortener 时, 我会将系统拆成两个核心流程: 短链接创建和短链接跳转。
创建流程需要生成全局唯一的 short code, 通常可以使用分布式 ID 生成器, 然后通过 Base62 编码得到短码。
Redirect 流程是典型的读多写少场景, 所以我会重点通过缓存、读副本和边缘部署来优化低延迟。
我会将 shortCode 到 longUrl 的核心 mapping 和 analytics 数据分开存储, 因为 redirect 路径必须保持低延迟, 而 analytics 可以异步处理。
这个系统的核心权衡包括唯一性、延迟、可用性、 新鲜度和统计准确性。
在大规模场景下, 我会使用分布式 ID 生成、热点链接缓存、数据库分片, 并通过异步 pipeline 处理点击事件。
最终目标是在保证短码唯一、安全和可观测性的前提下, 提供快速且可靠的跳转能力。
⭐ Final Insight
URL Shortener 的本质不是“缩短 URL”, 而是构建一个拥有全局唯一 ID 的高可用 redirect 系统。
Implement