d&d-t System Design Deep Dive ·

🎯 Design URL Shortener

1️⃣ Core Framework

When discussing URL Shortener design, I frame it as:

API design and core user flows
Short code generation strategy
Storage and data modeling
Redirect path optimization
Trade-offs: uniqueness vs latency vs availability
Scaling, caching, analytics, and abuse prevention

2️⃣ Core Requirements

Functional Requirements

Create a short URL from a long URL
Redirect users from short URL to original URL
Support custom aliases
Support expiration time
Track basic analytics

Non-functional Requirements

Very low redirect latency
High availability
Globally scalable reads
Strong uniqueness for short codes
Abuse prevention for malicious links

👉 Interview Answer

A URL shortener has two main flows: creating a short URL and redirecting users to the original URL.

The redirect path is much more read-heavy, so I would optimize it for low latency and high availability.

I would also consider uniqueness, expiration, analytics, and abuse prevention as important production-level requirements.

3️⃣ API Design

Create Short URL

POST /api/urls

Request:

{
  "longUrl": "https://example.com/some/very/long/path",
  "customAlias": "my-link",
  "expiresAt": "2026-12-31T00:00:00Z"
}

Response:

{
  "shortUrl": "https://short.ly/abc123",
  "shortCode": "abc123"
}

Redirect

GET /{shortCode}

Behavior:

Look up short code
Validate expiration
Redirect to long URL using 301 or 302

Analytics

GET /api/urls/{shortCode}/stats

👉 Interview Answer

I would expose a write API for creating short URLs and a read API for redirecting users.

The redirect API should be extremely lightweight, because it is the critical path and usually has much higher traffic.

4️⃣ Short Code Generation

Option 1: Hash Long URL

Example:

hash(longUrl) → shortCode

Pros:

Simple
Same long URL can generate same short code

Cons:

Collision possible
Harder to support custom aliases
Hash output may be too long

Option 2: Random Code

Example:

random base62 string → abc123

Pros:

Easy to generate
Good distribution

Cons:

Need collision check
Retry required on conflict

Option 3: Auto-increment ID + Base62

Example:

ID = 125000
Base62(ID) = xY9a

Pros:

Guaranteed uniqueness
Short and compact
Easy to reason about

Cons:

Centralized ID generation can become a bottleneck
Predictable codes unless obfuscated

Recommended Approach

Use:

Distributed ID Generator → Base62 Encode → shortCode

Examples:

Snowflake ID
Database sequence with range allocation
ID generation service

👉 Interview Answer

I would use a distributed ID generator and encode the generated ID using Base62.

This gives us uniqueness, compact short codes, and avoids repeated collision checks.

If custom aliases are supported, I would enforce uniqueness through a database constraint.

Core Insight

The hardest part of short code generation is not encoding — it is guaranteeing uniqueness at scale.

5️⃣ Data Model

URL Mapping Table

url_mapping (
  short_code VARCHAR PRIMARY KEY,
  long_url TEXT NOT NULL,
  user_id VARCHAR,
  created_at TIMESTAMP,
  expires_at TIMESTAMP,
  status VARCHAR
)

Analytics Table

url_click_event (
  event_id VARCHAR PRIMARY KEY,
  short_code VARCHAR,
  clicked_at TIMESTAMP,
  user_agent TEXT,
  ip_hash VARCHAR,
  country VARCHAR,
  referrer TEXT
)

Why Separate Mapping and Analytics?

Mapping table is latency-sensitive
Analytics is write-heavy
Analytics can be async
Avoid slowing down redirect path

👉 Interview Answer

I would separate the URL mapping table from analytics events.

The mapping table serves the redirect path and must be optimized for low latency.

Analytics events can be written asynchronously so they do not affect user-facing redirect performance.

6️⃣ Redirect Flow

Basic Flow

User visits short URL
Load balancer routes request
Redirect service extracts short code
Check cache
If cache miss, query database
Validate status and expiration
Return HTTP redirect

301 vs 302

Redirect Type	Meaning	Use Case
301	Permanent redirect	Better for static links
302	Temporary redirect	Better for analytics/control

7️⃣ Caching Strategy

Cache What?

shortCode → longUrl

Cache Layers

CDN / edge cache
Redis / Memcached
Local in-memory cache for hot links

Cache Challenges

Expired links
Updated destination URLs
Abuse blocking
Cache invalidation

Recommended Strategy

Cache active mappings
Use TTL aligned with expiration
Invalidate cache on update/delete/block
Use negative caching for missing short codes

👉 Interview Answer

Since redirects are read-heavy, caching is one of the most important optimizations.

I would cache shortCode-to-longUrl mappings in Redis and optionally at the edge for very hot links.

However, I need careful TTL and invalidation logic to handle expiration, updates, and abuse blocking.

8️⃣ Trade-offs

Uniqueness vs Simplicity

Random code is simple but needs collision handling
ID-based code is more predictable but guarantees uniqueness

Latency vs Analytics Accuracy

Synchronous analytics is accurate but slower
Async analytics is faster but may lose some events

Availability vs Consistency

Redirect path should favor availability
Creation path needs stronger consistency for uniqueness

Custom Alias vs Collision Risk

Custom alias improves UX
But requires uniqueness checks and reserved words

👉 Interview Answer

The main trade-offs are around uniqueness, latency, and availability.

For URL creation, I need strong uniqueness guarantees. For redirects, I prioritize low latency and high availability.

Analytics should usually be asynchronous because it should not slow down the redirect path.

9️⃣ Scaling Patterns

Pattern 1: Read-heavy Optimization

Redirect traffic is much higher than creation traffic.

Use:

Cache
CDN
Read replicas
Global routing

Pattern 2: Distributed ID Generation

Avoid single database bottleneck.

Use:

Snowflake-style IDs
Segment allocation
Dedicated ID service

Pattern 3: Async Analytics

Redirect path:

redirect request → return redirect → publish click event async

Analytics pipeline:

Kafka / Queue → Stream Processing → Analytics DB

Pattern 4: Database Sharding

Shard by:

short_code hash
user_id
creation time for analytics

Pattern 5: Multi-region Deployment

For global scale:

Deploy redirect service near users
Replicate URL mappings
Use eventual consistency for reads
Use single-region or strongly coordinated writes for creation

👉 Interview Answer

At scale, I would optimize the read-heavy redirect path first.

I would use caching, distributed ID generation, asynchronous analytics, and database sharding.

For global traffic, I would deploy redirect services in multiple regions and replicate URL mappings close to users.

🔟 Failure Handling & Edge Cases

Common Failures

Short code not found
Expired URL
Database unavailable
Cache stale
Analytics pipeline down
Malicious URL detected

Strategies

Return 404 for missing code
Return 410 for expired link
Serve from cache during DB failure
Retry analytics asynchronously
Block unsafe URLs
Rate limit suspicious users

👉 Interview Answer

The redirect path should degrade gracefully.

If analytics is down, redirects should still work. If the database is temporarily unavailable, we may still serve hot links from cache.

For missing or expired URLs, we return clear error responses like 404 or 410.

1️⃣1️⃣ Security & Abuse Prevention

Risks

Phishing links
Malware URLs
Spam generation
Brute-force scanning of short codes

Protection

Rate limiting
CAPTCHA for suspicious users
Safe browsing checks
Domain blocklist
Reserved aliases
Non-sequential or obfuscated codes

👉 Interview Answer

URL shorteners are often abused for phishing and spam, so security is a core part of the design.

I would add rate limiting, malicious URL detection, domain blocklists, and monitoring for suspicious traffic patterns.

1️⃣2️⃣ End-to-End Flow

Create Flow

User submits long URL
Validate URL
Check abuse rules
Generate unique ID
Encode ID to Base62 short code
Save mapping
Return short URL

Redirect Flow

User opens short URL
Extract short code
Check cache
Query DB on cache miss
Validate expiration/status
Emit analytics event async
Return 302 redirect

Key Insight

URL shortener is simple on the surface, but the real design challenge is building a low-latency, highly available redirect system.

🧠 Staff-Level Answer (Final)

👉 Interview Answer (Full Version)

When designing a URL shortener, I think of it as two main flows: URL creation and URL redirection.

The creation flow needs to generate globally unique short codes, usually by using a distributed ID generator and encoding the ID with Base62.

The redirect flow is much more read-heavy, so I would optimize it with caching, read replicas, and potentially edge deployment.

I would store the core shortCode-to-longUrl mapping separately from analytics data, because redirects must stay low-latency while analytics can be processed asynchronously.

The main trade-offs are uniqueness, latency, availability, freshness, and analytics accuracy.

At scale, I would use distributed ID generation, cache hot links, shard the database, and send click events through an async pipeline.

Ultimately, the goal is to provide fast and reliable redirects while maintaining uniqueness, security, and observability.

⭐ Final Insight

A URL shortener is not just about making URLs shorter — it is about building a highly available redirect system with globally unique identifiers.

中文部分

🎯 Design URL Shortener

1️⃣ 核心框架

在设计 URL Shortener 时，我通常从以下几个方面来分析：

API 设计和核心用户流程
短码生成策略
存储设计和数据建模
Redirect 路径优化
核心权衡：唯一性 vs 延迟 vs 可用性
扩展、缓存、统计分析和安全防护

2️⃣ 核心需求

功能需求

将长 URL 转换成短 URL
用户访问短 URL 时跳转到原始 URL
支持自定义 alias
支持过期时间
支持基本点击统计

非功能需求

Redirect 延迟非常低
高可用
支持全球范围读流量
短码必须唯一
防止恶意链接和垃圾链接滥用

👉 面试回答

URL Shortener 主要有两个核心流程：创建短链接和访问短链接进行跳转。

其中 redirect 路径通常是读多写少，所以需要重点优化低延迟和高可用。

同时，我也会考虑短码唯一性、过期时间、点击统计以及恶意链接防护等生产级需求。

3️⃣ API 设计

创建短链接

POST /api/urls

Request:

{
  "longUrl": "https://example.com/some/very/long/path",
  "customAlias": "my-link",
  "expiresAt": "2026-12-31T00:00:00Z"
}

Response:

{
  "shortUrl": "https://short.ly/abc123",
  "shortCode": "abc123"
}

Redirect

GET /{shortCode}

行为：

查找 short code
检查是否过期
使用 301 或 302 跳转到 long URL

Analytics

GET /api/urls/{shortCode}/stats

👉 面试回答

我会提供一个写 API 用于创建短链接，以及一个读 API 用于短链接跳转。

Redirect API 必须非常轻量，因为它是系统的核心路径，通常也是流量最高的路径。

4️⃣ 短码生成策略

方案 1：Hash Long URL

示例：

hash(longUrl) → shortCode

优点：

实现简单
相同 long URL 可以生成相同 short code

缺点：

可能发生冲突
不容易支持自定义 alias
Hash 结果可能过长

方案 2：随机码

示例：

random base62 string → abc123

优点：

生成简单
分布比较均匀

缺点：

需要检查冲突
冲突时需要重试

方案 3：自增 ID + Base62

示例：

ID = 125000
Base62(ID) = xY9a

优点：

可以保证唯一性
短码紧凑
容易理解和实现

缺点：

中心化 ID 生成可能成为瓶颈
如果不做处理，短码可能可预测

核心理解

短码生成最难的不是编码，而是在大规模下保证唯一性。

5️⃣ 数据模型

URL Mapping Table

url_mapping (
  short_code VARCHAR PRIMARY KEY,
  long_url TEXT NOT NULL,
  user_id VARCHAR,
  created_at TIMESTAMP,
  expires_at TIMESTAMP,
  status VARCHAR
)

Analytics Table

url_click_event (
  event_id VARCHAR PRIMARY KEY,
  short_code VARCHAR,
  clicked_at TIMESTAMP,
  user_agent TEXT,
  ip_hash VARCHAR,
  country VARCHAR,
  referrer TEXT
)

为什么 Mapping 和 Analytics 要分开？

Mapping 表服务 redirect 路径，延迟敏感
Analytics 是写入量很大的事件数据
Analytics 可以异步处理
避免统计逻辑拖慢 redirect

👉 面试回答

我会将 URL mapping 表和 analytics event 表分开。

Mapping 表用于 redirect 路径，必须针对低延迟查询进行优化。

Analytics 事件可以异步写入和处理，这样不会影响用户访问短链接的性能。

6️⃣ Redirect 流程

基本流程

用户访问短链接
Load balancer 路由请求
Redirect service 提取 short code
查询 cache
Cache miss 时查询数据库
检查状态和过期时间
返回 HTTP redirect

301 vs 302

Redirect 类型	含义	使用场景
301	永久跳转	静态、不变的链接
302	临时跳转	需要统计和控制的链接

7️⃣ 缓存策略

缓存什么？

shortCode → longUrl

缓存层

CDN / Edge cache
Redis / Memcached
本地内存缓存，用于热点链接

缓存挑战

链接过期
目标 URL 更新
恶意链接封禁
缓存失效

8️⃣ 核心权衡

唯一性 vs 简单性

随机码实现简单，但需要冲突处理
ID-based 方案更可预测，但可以保证唯一性

延迟 vs 统计准确性

同步写 analytics 更准确，但会拖慢 redirect
异步写 analytics 更快，但可能丢少量事件

可用性 vs 一致性

Redirect 路径优先高可用
创建短链接路径需要更强一致性来保证唯一性

Custom Alias vs Collision Risk

自定义 alias 用户体验好
但需要唯一性检查和保留词检查

👉 面试回答

URL Shortener 的核心权衡主要是唯一性、延迟和可用性。

创建短链接时，我需要更强的一致性来保证 short code 唯一。访问短链接时，我会优先保证低延迟和高可用。

Analytics 通常应该异步处理，因为它不应该拖慢 redirect 路径。

9️⃣ 扩展模式

Pattern 1: Read-heavy Optimization

Redirect 流量远高于创建流量。

可以使用：

Cache
CDN
Read replicas
Global routing

Pattern 2: Distributed ID Generation

避免单个数据库成为瓶颈。

可以使用：

Snowflake-style IDs
Segment allocation
Dedicated ID service

Pattern 3: Async Analytics

Redirect 路径：

redirect request → return redirect → publish click event async

Analytics Pipeline：

Kafka / Queue → Stream Processing → Analytics DB

Pattern 4: Database Sharding

分片方式：

short_code hash
user_id
analytics 按时间分片

Pattern 5: Multi-region Deployment

全球化场景：

Redirect service 部署到靠近用户的区域
URL mappings 跨区域复制
读路径可以接受 eventual consistency
写路径使用单区域或强协调写入

👉 面试回答

在大规模场景下，我会优先优化 read-heavy 的 redirect 路径。

我会使用缓存、分布式 ID 生成、异步 analytics 和数据库分片。

对于全球流量，我会将 redirect service 部署到多个 region，并将 URL mapping 复制到靠近用户的位置。

🔟 故障处理与边界情况

常见故障

Short code 不存在
URL 已过期
数据库不可用
缓存数据过期或不一致
Analytics pipeline 故障
检测到恶意 URL

处理策略

short code 不存在返回 404
URL 过期返回 410
数据库不可用时，热点链接可从 cache 返回
Analytics 异步重试
封禁恶意 URL
对可疑用户限流

👉 面试回答

Redirect 路径应该具备优雅降级能力。

如果 analytics 系统故障，redirect 仍然应该正常工作。如果数据库短暂不可用，热门链接可以暂时通过缓存继续服务。

对于不存在或过期的链接，系统应该返回清晰的错误状态，比如 404 或 410。

1️⃣1️⃣ 安全与滥用防护

风险

钓鱼链接
恶意软件链接
垃圾链接批量生成
暴力扫描 short code

防护方式

Rate limiting
对可疑用户使用 CAPTCHA
Safe browsing 检查
Domain blocklist
Reserved aliases
使用不可预测或混淆后的短码

👉 面试回答

URL Shortener 很容易被用于钓鱼和垃圾链接传播，所以安全防护是设计中的重要部分。

我会加入限流、恶意 URL 检测、域名黑名单，以及对异常访问模式的监控。

1️⃣2️⃣ End-to-End Flow

Create Flow

用户提交 long URL
校验 URL 格式
检查安全规则
生成唯一 ID
将 ID 编码成 Base62 short code
保存 mapping
返回 short URL

Redirect Flow

用户打开 short URL
提取 short code
查询 cache
Cache miss 时查询数据库
检查过期时间和状态
异步发送 analytics event
返回 302 redirect

Key Insight

URL Shortener 表面上很简单，但真正的设计挑战是构建一个低延迟、高可用的 redirect 系统。

🧠 Staff-Level Answer（最终版）

👉 面试回答（完整背诵版）

在设计 URL Shortener 时，我会将系统拆成两个核心流程：短链接创建和短链接跳转。

创建流程需要生成全局唯一的 short code，通常可以使用分布式 ID 生成器，然后通过 Base62 编码得到短码。

Redirect 流程是典型的读多写少场景，所以我会重点通过缓存、读副本和边缘部署来优化低延迟。

我会将 shortCode 到 longUrl 的核心 mapping 和 analytics 数据分开存储，因为 redirect 路径必须保持低延迟，而 analytics 可以异步处理。

这个系统的核心权衡包括唯一性、延迟、可用性、新鲜度和统计准确性。

在大规模场景下，我会使用分布式 ID 生成、热点链接缓存、数据库分片，并通过异步 pipeline 处理点击事件。

最终目标是在保证短码唯一、安全和可观测性的前提下，提供快速且可靠的跳转能力。

⭐ Final Insight

URL Shortener 的本质不是“缩短 URL”，而是构建一个拥有全局唯一 ID 的高可用 redirect 系统。

🎯 Design URL Shortener

1️⃣ Core Framework

2️⃣ Core Requirements

Functional Requirements

Non-functional Requirements

3️⃣ API Design

Create Short URL

Redirect

Analytics

4️⃣ Short Code Generation

Option 1: Hash Long URL

Option 2: Random Code

Option 3: Auto-increment ID + Base62

Recommended Approach

Core Insight

5️⃣ Data Model

URL Mapping Table

Analytics Table

Why Separate Mapping and Analytics?

6️⃣ Redirect Flow

Basic Flow

301 vs 302

Recommended

7️⃣ Caching Strategy

Cache What?

Cache Layers

Cache Challenges

Recommended Strategy

8️⃣ Trade-offs

Uniqueness vs Simplicity

Latency vs Analytics Accuracy

Availability vs Consistency

Custom Alias vs Collision Risk

9️⃣ Scaling Patterns

Pattern 1: Read-heavy Optimization

Pattern 2: Distributed ID Generation

Pattern 3: Async Analytics

Pattern 4: Database Sharding

Pattern 5: Multi-region Deployment

🔟 Failure Handling & Edge Cases

Common Failures

Strategies

1️⃣1️⃣ Security & Abuse Prevention

Risks

Protection

1️⃣2️⃣ End-to-End Flow

Create Flow

Redirect Flow

Key Insight

🧠 Staff-Level Answer (Final)

⭐ Final Insight

中文部分

🎯 Design URL Shortener

1️⃣ 核心框架

2️⃣ 核心需求

功能需求

非功能需求

3️⃣ API 设计

创建短链接

Redirect

Analytics

4️⃣ 短码生成策略

方案 1：Hash Long URL

方案 2：随机码

方案 3：自增 ID + Base62

推荐方案

核心理解

5️⃣ 数据模型

URL Mapping Table

Analytics Table

为什么 Mapping 和 Analytics 要分开？

6️⃣ Redirect 流程

基本流程

301 vs 302

推荐

7️⃣ 缓存策略

缓存什么？

缓存层

缓存挑战

推荐策略