System Design Deep Dive - 12 Design Video Streaming

Post by ailswan May. 05, 2026

中文 ↓

🎯 Design Video Streaming

1️⃣ Core Framework

When discussing Video Streaming design, I frame it as:

  1. Video upload and ingestion
  2. Metadata and storage design
  3. Transcoding and encoding pipeline
  4. Chunking and adaptive bitrate streaming
  5. CDN and playback delivery
  6. Recommendation, caching, and hot content handling
  7. Security, DRM, and access control
  8. Trade-offs: latency vs quality vs cost

2️⃣ Core Requirements


Functional Requirements


Non-functional Requirements


👉 Interview Answer

A video streaming system has two major flows: video ingestion and video playback.

The ingestion path handles upload, storage, transcoding, chunking, metadata generation, and CDN preparation.

The playback path focuses on low startup latency, adaptive bitrate streaming, caching, and reliable global delivery.


3️⃣ Main APIs


Upload Video

POST /api/videos/upload

Request:

{
  "userId": "u123",
  "title": "My Travel Video",
  "description": "Trip to Japan",
  "visibility": "public"
}

Response:

{
  "videoId": "v789",
  "uploadUrl": "https://upload.example.com/v789"
}

Get Video Metadata

GET /api/videos/{videoId}

Get Playback Manifest

GET /api/videos/{videoId}/manifest

Response:

{
  "videoId": "v789",
  "manifestUrl": "https://cdn.example.com/v789/master.m3u8"
}

Record Watch Event

POST /api/videos/{videoId}/watch-events

👉 Interview Answer

I would separate upload APIs from playback APIs.

Upload APIs return a signed upload URL, while playback APIs return metadata and a streaming manifest.

Watch events should be processed asynchronously because analytics should not block playback.


4️⃣ High-Level Architecture


Client
→ Upload Service
→ Object Storage
→ Metadata Service
→ Transcoding Pipeline
→ Chunk Storage
→ CDN
→ Playback Service
→ Analytics Pipeline

Main Components

Upload Service


Object Storage


Transcoding Service


Metadata Service


CDN


👉 Interview Answer

I would design video streaming with separate ingestion and serving paths.

Original videos are uploaded to object storage. A transcoding pipeline generates multiple resolutions, chunks the video, creates streaming manifests, and stores output files.

Playback is served through CDN for low latency and global scale.


5️⃣ Data Model


Video Metadata Table

video (
  video_id VARCHAR PRIMARY KEY,
  owner_id VARCHAR,
  title TEXT,
  description TEXT,
  status VARCHAR,
  visibility VARCHAR,
  duration_seconds INT,
  original_file_location TEXT,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
)

Video Variant Table

video_variant (
  variant_id VARCHAR PRIMARY KEY,
  video_id VARCHAR,
  resolution VARCHAR,
  bitrate INT,
  codec VARCHAR,
  manifest_url TEXT,
  storage_path TEXT,
  status VARCHAR
)

Video Chunk Table

video_chunk (
  video_id VARCHAR,
  variant_id VARCHAR,
  chunk_index INT,
  duration_seconds INT,
  storage_path TEXT,
  size_bytes BIGINT,
  checksum VARCHAR,
  PRIMARY KEY (variant_id, chunk_index)
)

Watch Event Table

watch_event (
  event_id VARCHAR PRIMARY KEY,
  video_id VARCHAR,
  user_id VARCHAR,
  watched_at TIMESTAMP,
  watch_duration_seconds INT,
  device_type VARCHAR,
  region VARCHAR
)

👉 Interview Answer

I would store video metadata separately from video files.

The metadata service tracks video status, ownership, visibility, duration, and available variants.

The actual video bytes, chunks, thumbnails, and manifests are stored in object storage and served through CDN.


6️⃣ Upload Flow


Basic Flow

User requests upload
→ Upload service creates video record
→ Upload service returns signed upload URL
→ Client uploads original video to object storage
→ Upload complete event is emitted
→ Transcoding pipeline starts

Why Signed Upload URL?


👉 Interview Answer

For upload, I would not proxy video bytes through the application server.

Instead, the upload service creates a video record and returns a signed upload URL.

The client uploads directly to object storage, and an upload-complete event triggers the transcoding pipeline.


7️⃣ Transcoding Pipeline


Why Transcoding?

Users watch videos on different devices and networks.

Need different:


Example Variants

240p  - low bandwidth
480p  - mobile
720p  - standard HD
1080p - full HD
4K    - high quality

Pipeline

Original video uploaded
→ Transcoding job created
→ Workers generate variants
→ Generate thumbnails
→ Generate subtitles if needed
→ Split variants into chunks
→ Generate manifest file
→ Mark video as ready

👉 Interview Answer

Transcoding converts the original uploaded video into multiple resolutions and bitrates.

This allows the player to adapt video quality based on the user’s network and device.

The transcoding pipeline should be asynchronous, because video processing is expensive and slow.


8️⃣ Chunking and Adaptive Bitrate Streaming


Why Chunking?

Instead of downloading the whole video, the player downloads small chunks.

Example:

chunk duration = 2 to 10 seconds

Benefits


Adaptive Bitrate Streaming

The player switches quality based on bandwidth.

Good network → 1080p chunks
Poor network → 480p chunks
Network improves → switch back to 720p/1080p

Common Protocols


Manifest File

Manifest describes available variants and chunks.

Example:

master.m3u8
→ 480p playlist
→ 720p playlist
→ 1080p playlist

👉 Interview Answer

I would split videos into small chunks and use adaptive bitrate streaming.

The player first downloads a manifest file, then chooses which quality level to fetch based on network bandwidth and device capability.

This improves startup time, reduces buffering, and provides a smoother playback experience.


9️⃣ Playback Flow


Basic Flow

User opens video
→ Playback service checks permission
→ Return video metadata and manifest URL
→ Player downloads manifest from CDN
→ Player downloads video chunks from CDN
→ Player adapts bitrate dynamically
→ Watch events sent asynchronously

Startup Optimization


👉 Interview Answer

During playback, the service checks whether the user can access the video, then returns metadata and a manifest URL.

The player downloads the manifest and video chunks from CDN.

To reduce startup latency, the player can start with a lower bitrate and switch to higher quality once bandwidth is measured.


🔟 CDN and Caching


Why CDN?

Video traffic is bandwidth-heavy.

CDN helps by:


What to Cache?


Cache Challenges


👉 Interview Answer

CDN is essential for video streaming, because video delivery is extremely bandwidth-heavy.

I would cache video chunks, manifests, thumbnails, and previews at the edge.

For private videos, I would use signed URLs or signed cookies to control access through the CDN.


1️⃣1️⃣ Hot Video Handling


What Is a Hot Video?

A video that suddenly gets massive traffic.

Examples:


Problems


Strategies


👉 Interview Answer

Hot videos can create sudden traffic spikes.

I would rely heavily on CDN caching, pre-warm popular content, cache metadata, and replicate video chunks across regions.

If needed, non-critical features like comments or recommendations can be degraded while playback remains available.


1️⃣2️⃣ Live Streaming vs VOD


VOD: Video on Demand


Live Streaming


Live Flow

Streamer uploads live feed
→ Ingest server
→ Real-time transcoding
→ Segment generation
→ CDN
→ Viewer playback

👉 Interview Answer

Video-on-demand is easier because content can be processed before playback.

Live streaming is harder because ingestion, transcoding, chunk generation, and delivery all happen in real time.

For live streaming, the key trade-off is latency versus reliability and playback stability.


1️⃣3️⃣ Analytics Pipeline


Watch Events

Examples:


Flow

Player emits watch events
→ Analytics ingestion
→ Queue
→ Stream processing
→ Aggregated metrics
→ Recommendation / reporting

Use Cases


👉 Interview Answer

Watch analytics should be processed asynchronously.

The player emits events like play, pause, seek, buffering, quality switch, and watch duration.

These events feed analytics, recommendations, creator dashboards, and quality-of-experience monitoring.


1️⃣4️⃣ Security, DRM, and Access Control


Access Control

Support:


Signed URLs

Used to protect CDN access.

URL valid for limited time

DRM

Used for premium content.

Supports:


Takedown / Moderation

Need to support:


👉 Interview Answer

Video streaming requires strong access control.

For private or paid content, I would use signed URLs, signed cookies, and possibly DRM.

The playback service should check permissions before returning the manifest, and CDN access should be protected to prevent direct unauthorized downloads.


1️⃣5️⃣ Storage and Cost Control


Storage Types


Cost Challenges


Strategies


👉 Interview Answer

Video storage is expensive because one upload can produce many variants.

I would use lifecycle policies, cold storage for originals, selective transcoding, and CDN caching to control storage and bandwidth cost.


1️⃣6️⃣ Failure Handling


Common Failures


Strategies


👉 Interview Answer

The system should handle failures without breaking playback whenever possible.

Uploads should support retry and resume. Transcoding jobs should be retryable and tracked through a queue.

If high-resolution chunks are unavailable, the player can fall back to lower resolution.

Analytics failures should not block playback.


1️⃣7️⃣ Consistency Model


Stronger Consistency Needed For


Eventual Consistency Acceptable For


👉 Interview Answer

Video playback metadata and permissions need stronger correctness, especially for private, paid, or removed videos.

But analytics, recommendations, view counts, and CDN propagation can be eventually consistent.


1️⃣8️⃣ Observability


Key Metrics


Quality of Experience Metrics


👉 Interview Answer

Observability should focus on both backend health and viewer experience.

I would track upload success, transcoding lag, CDN cache hit rate, playback startup latency, buffering ratio, playback errors, and watch event ingestion lag.


1️⃣9️⃣ End-to-End Flow


Upload Flow

User requests upload
→ Upload service returns signed URL
→ Client uploads video to object storage
→ Upload-complete event emitted
→ Transcoding jobs created
→ Variants and chunks generated
→ Manifest created
→ Video marked ready

Playback Flow

User opens video
→ Playback service checks permission
→ Return metadata and manifest URL
→ Player downloads manifest from CDN
→ Player downloads chunks from CDN
→ Player switches quality dynamically
→ Watch events sent asynchronously

Analytics Flow

Player emits events
→ Analytics ingestion
→ Queue
→ Stream processing
→ Aggregated metrics
→ Recommendations and dashboards

Key Insight

Video Streaming is not just file download — it is an ingestion, transcoding, chunk delivery, and adaptive playback system.


🧠 Staff-Level Answer (Final)


👉 Interview Answer (Full Version)

When designing a video streaming system, I think of it as two major pipelines: video ingestion and video playback.

In the ingestion pipeline, users upload original videos directly to object storage through signed upload URLs.

After upload completion, an asynchronous transcoding pipeline converts the original video into multiple resolutions and bitrates, generates thumbnails, splits videos into chunks, creates streaming manifests, and marks the video as ready.

In the playback pipeline, the playback service checks user permissions, returns metadata and a manifest URL, and the player downloads manifests and chunks from CDN.

I would use adaptive bitrate streaming, so the player can switch between 480p, 720p, 1080p, or higher quality based on network conditions.

CDN is critical because video delivery is bandwidth-heavy. It reduces latency, lowers origin load, and improves global playback performance.

I would store video metadata separately from video bytes. Metadata tracks ownership, visibility, status, duration, variants, and manifests, while video chunks are stored in object storage.

Watch analytics should be asynchronous and should not block playback.

For security, I would use signed URLs, permission checks, and DRM for premium content.

The main trade-offs are startup latency, video quality, storage cost, bandwidth cost, processing delay, and consistency of permissions.

Ultimately, the goal is to deliver smooth, low-latency playback globally, while keeping storage, transcoding, and bandwidth cost under control.


⭐ Final Insight

Video Streaming 的核心不是简单播放文件, 而是 upload、transcoding、chunking、CDN delivery 和 adaptive bitrate playback 组成的大规模媒体分发系统。



中文部分


🎯 Design Video Streaming


1️⃣ 核心框架

在设计 Video Streaming 时,我通常从以下几个方面来分析:

  1. Video upload 和 ingestion
  2. Metadata 和 storage design
  3. Transcoding 和 encoding pipeline
  4. Chunking 和 adaptive bitrate streaming
  5. CDN 和 playback delivery
  6. Recommendation、caching 和 hot content handling
  7. Security、DRM 和 access control
  8. 核心权衡:latency vs quality vs cost

2️⃣ 核心需求


功能需求


非功能需求


👉 面试回答

Video Streaming System 有两个主要流程: video ingestion 和 video playback。

Ingestion path 负责 upload、storage、transcoding、 chunking、metadata generation 和 CDN 准备。

Playback path 主要优化低启播延迟、 adaptive bitrate streaming、 caching 和可靠的全球分发。


3️⃣ 主要 API


Upload Video

POST /api/videos/upload

Request:

{
  "userId": "u123",
  "title": "My Travel Video",
  "description": "Trip to Japan",
  "visibility": "public"
}

Response:

{
  "videoId": "v789",
  "uploadUrl": "https://upload.example.com/v789"
}

Get Video Metadata

GET /api/videos/{videoId}

Get Playback Manifest

GET /api/videos/{videoId}/manifest

Response:

{
  "videoId": "v789",
  "manifestUrl": "https://cdn.example.com/v789/master.m3u8"
}

Record Watch Event

POST /api/videos/{videoId}/watch-events

👉 面试回答

我会将 upload APIs 和 playback APIs 分开。

Upload API 返回 signed upload URL, playback API 返回视频 metadata 和 streaming manifest。

Watch events 应该异步处理, 因为 analytics 不应该阻塞播放路径。


4️⃣ High-Level Architecture


Client
→ Upload Service
→ Object Storage
→ Metadata Service
→ Transcoding Pipeline
→ Chunk Storage
→ CDN
→ Playback Service
→ Analytics Pipeline

Main Components

Upload Service


Object Storage


Transcoding Service


Metadata Service


CDN


👉 面试回答

我会将 video streaming 设计成 ingestion path 和 serving path 分离的架构。

原始视频上传到 object storage。 Transcoding pipeline 生成多个分辨率, 将视频切成 chunks, 创建 streaming manifests, 并存储输出文件。

Playback 通过 CDN 提供服务, 以获得低延迟和全球扩展能力。


5️⃣ 数据模型


Video Metadata Table

video (
  video_id VARCHAR PRIMARY KEY,
  owner_id VARCHAR,
  title TEXT,
  description TEXT,
  status VARCHAR,
  visibility VARCHAR,
  duration_seconds INT,
  original_file_location TEXT,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
)

Video Variant Table

video_variant (
  variant_id VARCHAR PRIMARY KEY,
  video_id VARCHAR,
  resolution VARCHAR,
  bitrate INT,
  codec VARCHAR,
  manifest_url TEXT,
  storage_path TEXT,
  status VARCHAR
)

Video Chunk Table

video_chunk (
  video_id VARCHAR,
  variant_id VARCHAR,
  chunk_index INT,
  duration_seconds INT,
  storage_path TEXT,
  size_bytes BIGINT,
  checksum VARCHAR,
  PRIMARY KEY (variant_id, chunk_index)
)

Watch Event Table

watch_event (
  event_id VARCHAR PRIMARY KEY,
  video_id VARCHAR,
  user_id VARCHAR,
  watched_at TIMESTAMP,
  watch_duration_seconds INT,
  device_type VARCHAR,
  region VARCHAR
)

👉 面试回答

我会将 video metadata 和 video files 分开存储。

Metadata service 负责跟踪视频状态、所有者、 可见性、时长和可用 variants。

实际 video bytes、chunks、thumbnails 和 manifests 会存储在 object storage 中, 并通过 CDN 分发。


6️⃣ Upload Flow


Basic Flow

User requests upload
→ Upload service creates video record
→ Upload service returns signed upload URL
→ Client uploads original video to object storage
→ Upload complete event is emitted
→ Transcoding pipeline starts

为什么使用 Signed Upload URL?


👉 面试回答

对于 upload, 我不会让 application server 代理视频 bytes。

更好的方式是 upload service 创建 video record, 然后返回 signed upload URL。

Client 直接上传到 object storage, upload-complete event 再触发 transcoding pipeline。


7️⃣ Transcoding Pipeline


为什么需要 Transcoding?

用户会在不同设备和网络条件下观看视频。

需要不同:


Example Variants

240p  - low bandwidth
480p  - mobile
720p  - standard HD
1080p - full HD
4K    - high quality

Pipeline

Original video uploaded
→ Transcoding job created
→ Workers generate variants
→ Generate thumbnails
→ Generate subtitles if needed
→ Split variants into chunks
→ Generate manifest file
→ Mark video as ready

👉 面试回答

Transcoding 会将用户上传的原始视频 转换成多个分辨率和 bitrate。

这样播放器可以根据用户网络和设备能力 自适应选择视频质量。

Transcoding pipeline 应该异步执行, 因为视频处理通常昂贵且耗时。


8️⃣ Chunking and Adaptive Bitrate Streaming


为什么需要 Chunking?

播放器不需要下载完整视频, 而是下载小的 video chunks。

示例:

chunk duration = 2 to 10 seconds

Benefits


Adaptive Bitrate Streaming

播放器根据带宽切换质量:

Good network → 1080p chunks
Poor network → 480p chunks
Network improves → switch back to 720p/1080p

Common Protocols


Manifest File

Manifest 描述可用 variants 和 chunks。

示例:

master.m3u8
→ 480p playlist
→ 720p playlist
→ 1080p playlist

👉 面试回答

我会将视频切成小 chunks, 并使用 adaptive bitrate streaming。

Player 会先下载 manifest file, 然后根据网络带宽和设备能力 选择合适质量的视频 chunks。

这样可以缩短启播时间, 减少 buffering, 并提供更流畅的播放体验。


9️⃣ Playback Flow


Basic Flow

User opens video
→ Playback service checks permission
→ Return video metadata and manifest URL
→ Player downloads manifest from CDN
→ Player downloads video chunks from CDN
→ Player adapts bitrate dynamically
→ Watch events sent asynchronously

Startup Optimization


👉 面试回答

在 playback 过程中, service 会检查用户是否有权限访问视频, 然后返回 metadata 和 manifest URL。

Player 再从 CDN 下载 manifest 和 video chunks。

为了降低启播延迟, player 可以先从较低 bitrate 开始播放, 等测量网络后再切换到更高清晰度。


🔟 CDN and Caching


为什么需要 CDN?

Video traffic 非常消耗带宽。

CDN 可以:


缓存什么?


缓存挑战


👉 面试回答

CDN 对 video streaming 至关重要, 因为视频分发非常消耗带宽。

我会将 video chunks、manifests、 thumbnails 和 previews 缓存在 edge。

对于 private videos, 我会使用 signed URLs 或 signed cookies 通过 CDN 控制访问权限。


1️⃣1️⃣ Hot Video Handling


什么是 Hot Video?

突然获得大量流量的视频。

示例:


Problems


Strategies


👉 面试回答

Hot videos 可能造成突然流量峰值。

我会强依赖 CDN caching, 对热门内容进行 CDN pre-warm, 缓存 metadata, 并将 video chunks 跨 region 复制。

如果需要, comments 或 recommendations 这类非关键功能可以降级, 但 playback 应该保持可用。


1️⃣2️⃣ Live Streaming vs VOD


VOD: Video on Demand


Live Streaming


Live Flow

Streamer uploads live feed
→ Ingest server
→ Real-time transcoding
→ Segment generation
→ CDN
→ Viewer playback

👉 面试回答

Video-on-demand 更容易, 因为内容可以在播放前处理好。

Live streaming 更难, 因为 ingestion、transcoding、chunk generation 和 delivery 都要实时发生。

对 live streaming 来说, 核心权衡是 latency 和 playback stability。


1️⃣3️⃣ Analytics Pipeline


Watch Events

示例:


Flow

Player emits watch events
→ Analytics ingestion
→ Queue
→ Stream processing
→ Aggregated metrics
→ Recommendation / reporting

Use Cases


👉 面试回答

Watch analytics 应该异步处理。

Player 会发送 play、pause、seek、buffering、 quality switch 和 watch duration 等事件。

这些事件会用于 analytics、recommendations、 creator dashboard 和 playback quality monitoring。


1️⃣4️⃣ Security, DRM, and Access Control


Access Control

支持:


Signed URLs

用于保护 CDN access。

URL valid for limited time

DRM

用于 premium content。

支持:


Takedown / Moderation

需要支持:


👉 面试回答

Video streaming 需要强 access control。

对于 private 或 paid content, 我会使用 signed URLs、signed cookies, 以及 DRM。

Playback service 在返回 manifest 前应该检查权限, CDN access 也应该受保护, 防止用户绕过权限直接下载视频。


1️⃣5️⃣ Storage and Cost Control


Storage Types


Cost Challenges


Strategies


👉 面试回答

Video storage 很昂贵, 因为一个上传视频会生成多个 variants。

我会使用 lifecycle policies、cold storage、 selective transcoding 和 CDN caching 来控制存储和带宽成本。


1️⃣6️⃣ Failure Handling


Common Failures


Strategies


👉 面试回答

系统应该尽可能避免故障影响 playback。

Uploads 应该支持 retry 和 resume。 Transcoding jobs 应该可以通过 queue 重试和追踪。

如果高分辨率 chunks 不可用, player 可以 fallback 到低分辨率。

Analytics 失败不应该阻塞播放。


1️⃣7️⃣ Consistency Model


需要较强一致性的场景


可以最终一致的场景


👉 面试回答

Video playback metadata 和 permissions 需要更强正确性, 尤其是 private、paid 或 removed videos。

但是 analytics、recommendations、view counts 和 CDN propagation 可以最终一致。


1️⃣8️⃣ Observability


Key Metrics


Quality of Experience Metrics


👉 面试回答

可观测性应该同时关注 backend health 和 viewer experience。

我会追踪 upload success、transcoding lag、 CDN cache hit rate、playback startup latency、 buffering ratio、playback errors 和 watch event ingestion lag。


1️⃣9️⃣ End-to-End Flow


Upload Flow

User requests upload
→ Upload service returns signed URL
→ Client uploads video to object storage
→ Upload-complete event emitted
→ Transcoding jobs created
→ Variants and chunks generated
→ Manifest created
→ Video marked ready

Playback Flow

User opens video
→ Playback service checks permission
→ Return metadata and manifest URL
→ Player downloads manifest from CDN
→ Player downloads chunks from CDN
→ Player switches quality dynamically
→ Watch events sent asynchronously

Analytics Flow

Player emits events
→ Analytics ingestion
→ Queue
→ Stream processing
→ Aggregated metrics
→ Recommendations and dashboards

Key Insight

Video Streaming 不是简单下载文件, 而是 ingestion、transcoding、chunk delivery 和 adaptive playback system。


🧠 Staff-Level Answer(最终版)


👉 面试回答(完整背诵版)

在设计 Video Streaming System 时, 我会把它拆成两个核心 pipeline: video ingestion 和 video playback。

在 ingestion pipeline 中, 用户通过 signed upload URL 将原始视频 直接上传到 object storage。

上传完成后, 异步 transcoding pipeline 会将原始视频转换成多个分辨率和 bitrate, 生成 thumbnails, 将视频切成 chunks, 创建 streaming manifests, 并最终将 video 标记为 ready。

在 playback pipeline 中, playback service 会检查用户权限, 返回 metadata 和 manifest URL, 然后 player 从 CDN 下载 manifest 和 chunks。

我会使用 adaptive bitrate streaming, 让 player 根据网络情况在 480p、720p、1080p 或更高清晰度之间动态切换。

CDN 非常关键, 因为视频分发消耗大量带宽。 CDN 可以降低延迟、减少 origin load, 并提升全球播放体验。

我会将 video metadata 和 video bytes 分开存储。 Metadata 负责跟踪 ownership、visibility、status、 duration、variants 和 manifests; video chunks 则存储在 object storage 中。

Watch analytics 应该异步处理, 不应该阻塞 playback。

在安全方面, 我会使用 signed URLs、permission checks, 并对 premium content 使用 DRM。

核心权衡包括 startup latency、video quality、 storage cost、bandwidth cost、processing delay 和 permission consistency。

最终目标是在全球范围内提供流畅、低延迟播放, 同时控制 storage、transcoding 和 bandwidth 成本。


⭐ Final Insight

Video Streaming 的核心不是简单播放文件, 而是 upload、transcoding、chunking、CDN delivery 和 adaptive bitrate playback 组成的大规模媒体分发系统。

Implement