·

System Design Deep Dive - 03 How TikTok Builds Recommendation Pipelines

Post by ailswan May. 26, 2026

中文 ↓

🎯 How TikTok Builds Recommendation Pipelines


1️⃣ Core Framework

When discussing TikTok-style Recommendation Pipelines, I frame it as:

  1. Event collection
  2. Feature generation
  3. Candidate generation
  4. Ranking
  5. Reranking and policy filters
  6. Real-time feedback loops
  7. Exploration vs exploitation
  8. Trade-offs: relevance vs freshness vs latency vs diversity

2️⃣ The Core Problem

A short-video recommendation system must choose a small set of videos from a massive content pool within a tight latency budget.


Challenges


👉 Interview Memorization

A TikTok-like recommendation system is a low-latency multi-stage ranking system driven by fast feedback loops from user behavior.


3️⃣ High-level Architecture


Online Serving Path

User opens feed

↓

Candidate Generation

↓

Feature Fetch

↓

Ranking Model

↓

Reranking / Filters

↓

Feed Response

Offline and Realtime Pipelines

User Events

↓

Event Stream

↓

Feature Pipelines

↓

Model Training

↓

Model Serving

👉 Interview Memorization

Recommendation systems separate online low-latency serving from offline and realtime pipelines that generate features and train models.


4️⃣ Event Collection

Recommendations improve from behavior data.


Common Events


Event Flow

Client Event

↓

Event Collector

↓

Stream Platform

↓

Feature Store / Data Lake

👉 Interview Memorization

Short-video recommendations depend heavily on high-quality event collection because watch behavior is the strongest feedback signal.


5️⃣ Feature Generation

Features describe users, videos, context, and sessions.


User Features


Video Features


Context Features


👉 Interview Memorization

Recommendation quality depends on fresh user, item, and context features that capture both long-term preference and current session intent.


6️⃣ Candidate Generation

Candidate generation narrows millions of videos down to hundreds or thousands.


Candidate Sources


Architecture

Huge Video Corpus

↓

Multiple Candidate Generators

↓

Candidate Pool

👉 Interview Memorization

Candidate generation optimizes recall by quickly retrieving a manageable set of possibly relevant videos from a huge corpus.


7️⃣ Ranking

Ranking scores each candidate with richer models and features.


Predicted Objectives


Ranking Flow

Candidate Videos

↓

Fetch Features

↓

Model Scores

↓

Sorted Candidates

👉 Interview Memorization

Ranking spends more compute on a smaller candidate set to estimate engagement, satisfaction, and risk signals more accurately.


8️⃣ Reranking

The top model scores are not always the final feed order.


Reranking Goals


Example

Top 10 are all same topic

↓

Reranker mixes topics and creators

👉 Interview Memorization

Reranking adjusts the model-sorted list to improve diversity, safety, freshness, and overall feed experience.


9️⃣ Real-time Feedback Loop

Short-video systems react quickly to behavior.


Example

User watches cooking videos to completion

↓

Realtime features update

↓

Next feed request includes more cooking candidates

Fresh Signals


👉 Interview Memorization

Real-time feedback loops let the feed adapt within a session instead of waiting for offline model retraining.


🔟 Feature Store

Feature stores serve both training and online inference.


Two Views

Offline Feature Store
→ training data

Online Feature Store
→ low-latency serving

Requirements


👉 Interview Memorization

Feature stores keep training and serving features consistent while supporting low-latency online reads and large-scale offline training.


1️⃣1️⃣ Cold Start

Cold start happens for new users or new videos.


New User

Use:


New Video

Use:


👉 Interview Memorization

Cold start is handled with metadata, content embeddings, popularity priors, and controlled exploration to gather early feedback.


1️⃣2️⃣ Exploration vs Exploitation

Only showing known favorites can trap the feed.


Exploitation

Show videos likely to perform well now.

Exploration

Try uncertain videos to learn user interest and evaluate new content.

Why Exploration Matters


👉 Interview Memorization

Recommendation systems need exploration to learn, avoid stale feeds, and give new content a chance.


1️⃣3️⃣ Safety and Policy Filters

Recommendation quality includes safety.


Filters


Where Applied

Before ranking

During reranking

Before final response

👉 Interview Memorization

Safety and policy filters must be part of the recommendation pipeline, not an afterthought.


1️⃣4️⃣ Latency Budget

Feed serving must be fast.


Example Budget

Candidate generation: 30 ms

Feature fetch: 40 ms

Ranking: 50 ms

Reranking: 20 ms

Response assembly: 10 ms

Optimization


👉 Interview Memorization

Recommendation serving is latency-constrained, so candidate retrieval, feature fetching, and model inference must be optimized and often run in parallel.


1️⃣5️⃣ Training Pipeline

Models are trained from historical events.


Flow

Raw Events

↓

Clean and Join

↓

Create Labels

↓

Generate Training Examples

↓

Train Model

↓

Evaluate

↓

Deploy

Important Issues


👉 Interview Memorization

Recommendation training pipelines must handle delayed labels, bias, feature correctness, and online/offline metric mismatch.


1️⃣6️⃣ Experimentation

Recommendation changes must be tested carefully.


Experiment Metrics


Rollout

Offline evaluation

↓

Small A/B test

↓

Ramp gradually

↓

Monitor guardrails

👉 Interview Memorization

Recommendation systems require experimentation because offline metrics do not always predict real user behavior.


1️⃣7️⃣ Failure Handling


Common Failures


Fallbacks


👉 Interview Memorization

Recommendation systems need graceful fallbacks such as cached feeds, popular content, simpler models, and model rollback.


1️⃣8️⃣ Observability


Monitor


👉 Interview Memorization

Recommendation observability must track both system health and model quality, including feature freshness and user feedback.


1️⃣9️⃣ Trade-off Table


Dimension Choice Benefit Cost
More candidate sources Better recall More latency  
Bigger ranking model Better accuracy Higher serving cost  
Realtime features Fresher feed More pipeline complexity  
More exploration Better learning Short-term relevance risk  
More diversity Better experience May reduce predicted engagement  
Strict filters Safer feed Lower candidate pool  

👉 Interview Memorization

Recommendation systems constantly trade relevance, freshness, diversity, safety, latency, and compute cost.


2️⃣0️⃣ Best Practices


Practical Rules


Design Principle

Retrieve broadly.

Rank precisely.

Learn continuously.

👉 Interview Memorization

TikTok-like recommendation pipelines win through multi-stage ranking, fresh behavior signals, fast feedback loops, and careful online experimentation.


🧠 Staff-Level Answer Final


👉 Full Interview Answer

A TikTok-like recommendation system uses a multi-stage pipeline because the content corpus is too large to rank everything directly.

The online serving path first runs multiple candidate generators, such as embedding retrieval, collaborative filtering, trending content, follow graph, language and region pools, and fresh uploads.

It then fetches user, video, and context features and applies a heavier ranking model to predict outcomes like watch time, completion, likes, shares, follows, and negative feedback.

A reranking layer adjusts the list for diversity, freshness, creator variety, exploration, and policy constraints.

The offline and realtime pipelines collect events such as impressions, watch time, skips, likes, shares, comments, reports, and replays.

These events update realtime features, feed model training, and close the feedback loop so the feed can adapt quickly.

The feature store must support both offline training and low-latency online serving while keeping feature definitions consistent.

Cold start is handled with metadata, content embeddings, popularity priors, and controlled exploration.

The main trade-off is relevance and freshness versus latency, diversity, safety, and compute cost.


⭐ Final Insight

TikTok-style Recommendation Pipeline 的核心不是:

“训练一个模型排序”

而是:

Event Collection

  • Candidate Generation
  • Feature Store
  • Ranking
  • Reranking
  • Realtime Feedback
  • Experimentation

最重要的一句话:

Retrieve broadly.

Rank precisely.

Learn continuously.


中文部分

🎯 How TikTok Builds Recommendation Pipelines(TikTok 风格推荐系统管线)


核心理解

短视频推荐不是简单排序。

它是一个低延迟、多阶段、强反馈循环的系统。

核心问题:

从海量视频里

快速找出少量候选

再用更强模型排序

并持续从用户反馈学习

高层架构

User opens feed

↓

Candidate Generation

↓

Feature Fetch

↓

Ranking Model

↓

Reranking / Filters

↓

Feed Response

数据管线:

User Events

↓

Event Stream

↓

Feature Pipelines

↓

Model Training

↓

Model Serving

Event Collection

推荐系统依赖用户行为:

Watch behavior 通常是非常强的信号。


Feature Generation

常见 feature:

User Features

Video Features

Context Features


Candidate Generation

不能对所有视频直接排序。

第一阶段先召回候选:

Huge Video Corpus

↓

Multiple Candidate Generators

↓

Candidate Pool

候选来源:


Ranking

Ranking 对候选视频进行更精细打分。

预测目标可能包括:


Reranking

模型分数最高不一定就是最终顺序。

Reranking 会考虑:


Real-time Feedback

短视频推荐需要快速响应用户行为。

User watches cooking videos

↓

Realtime features update

↓

Next feed has more cooking videos

Cold Start

新用户:

新视频:


Exploration vs Exploitation

Exploitation:

展示当前最可能喜欢的内容

Exploration:

尝试不确定内容来学习新兴趣

没有 exploration,feed 会越来越窄。


面试回答模板

A TikTok-like recommendation system uses a multi-stage pipeline.

Candidate generation first retrieves a manageable set of videos from a huge corpus using embeddings, collaborative filtering, trending pools, follow graph, fresh uploads, and regional or language pools.

Ranking then scores candidates with richer user, video, and context features to predict watch time, completion, engagement, and negative feedback.

Reranking adjusts the final list for diversity, freshness, creator variety, exploration, and safety constraints.

The system depends on realtime event collection from impressions, watch time, skips, likes, shares, comments, and reports.

These events update features, train models, and create fast feedback loops.

The key trade-off is relevance and freshness versus latency, diversity, safety, and compute cost.


最终总结

Retrieve broadly.

Rank precisely.

Learn continuously.

核心原则:

Candidate Generation + Ranking + Reranking + Realtime Feedback

Implement