🎯 How TikTok Builds Recommendation Pipelines
1️⃣ Core Framework
When discussing TikTok-style Recommendation Pipelines, I frame it as:
- Event collection
- Feature generation
- Candidate generation
- Ranking
- Reranking and policy filters
- Real-time feedback loops
- Exploration vs exploitation
- Trade-offs: relevance vs freshness vs latency vs diversity
2️⃣ The Core Problem
A short-video recommendation system must choose a small set of videos from a massive content pool within a tight latency budget.
Challenges
- Huge video corpus
- Rapidly changing user interests
- New content cold start
- Low latency serving
- Feedback loops
- Creator fairness
- Safety and policy constraints
- Avoiding repetitive feeds
👉 Interview Memorization
A TikTok-like recommendation system is a low-latency multi-stage ranking system driven by fast feedback loops from user behavior.
3️⃣ High-level Architecture
Online Serving Path
User opens feed
↓
Candidate Generation
↓
Feature Fetch
↓
Ranking Model
↓
Reranking / Filters
↓
Feed Response
Offline and Realtime Pipelines
User Events
↓
Event Stream
↓
Feature Pipelines
↓
Model Training
↓
Model Serving
👉 Interview Memorization
Recommendation systems separate online low-latency serving from offline and realtime pipelines that generate features and train models.
4️⃣ Event Collection
Recommendations improve from behavior data.
Common Events
- Impression
- Watch time
- Completion rate
- Like
- Share
- Comment
- Follow
- Skip
- Not interested
- Report
- Replay
Event Flow
Client Event
↓
Event Collector
↓
Stream Platform
↓
Feature Store / Data Lake
👉 Interview Memorization
Short-video recommendations depend heavily on high-quality event collection because watch behavior is the strongest feedback signal.
5️⃣ Feature Generation
Features describe users, videos, context, and sessions.
User Features
- Interests
- Watch history
- Skips
- Language
- Region
- Device
- Follow graph
- Long-term preferences
Video Features
- Embedding
- Topic
- Language
- Creator
- Freshness
- Engagement rate
- Safety labels
- Audio or visual signals
Context Features
- Time of day
- Network
- Session length
- Current feed position
- Recent interactions
👉 Interview Memorization
Recommendation quality depends on fresh user, item, and context features that capture both long-term preference and current session intent.
6️⃣ Candidate Generation
Candidate generation narrows millions of videos down to hundreds or thousands.
Candidate Sources
- Similar video embeddings
- User interest clusters
- Follow graph
- Trending videos
- Fresh uploads
- Geo or language pools
- Collaborative filtering
- Creator affinity
Architecture
Huge Video Corpus
↓
Multiple Candidate Generators
↓
Candidate Pool
👉 Interview Memorization
Candidate generation optimizes recall by quickly retrieving a manageable set of possibly relevant videos from a huge corpus.
7️⃣ Ranking
Ranking scores each candidate with richer models and features.
Predicted Objectives
- Watch time
- Completion probability
- Like probability
- Share probability
- Comment probability
- Follow probability
- Negative feedback probability
- Long-term satisfaction
Ranking Flow
Candidate Videos
↓
Fetch Features
↓
Model Scores
↓
Sorted Candidates
👉 Interview Memorization
Ranking spends more compute on a smaller candidate set to estimate engagement, satisfaction, and risk signals more accurately.
8️⃣ Reranking
The top model scores are not always the final feed order.
Reranking Goals
- Diversity
- Freshness
- Creator variety
- Topic variety
- Safety constraints
- Business rules
- Avoid repeated content
- Exploration quota
Example
Top 10 are all same topic
↓
Reranker mixes topics and creators
👉 Interview Memorization
Reranking adjusts the model-sorted list to improve diversity, safety, freshness, and overall feed experience.
9️⃣ Real-time Feedback Loop
Short-video systems react quickly to behavior.
Example
User watches cooking videos to completion
↓
Realtime features update
↓
Next feed request includes more cooking candidates
Fresh Signals
- Recent watch completions
- Recent skips
- Session topic interest
- Last few interactions
- Short-term embedding updates
👉 Interview Memorization
Real-time feedback loops let the feed adapt within a session instead of waiting for offline model retraining.
🔟 Feature Store
Feature stores serve both training and online inference.
Two Views
Offline Feature Store
→ training data
Online Feature Store
→ low-latency serving
Requirements
- Low-latency reads
- Fresh updates
- Backfill support
- Consistent definitions
- Point-in-time correctness for training
👉 Interview Memorization
Feature stores keep training and serving features consistent while supporting low-latency online reads and large-scale offline training.
1️⃣1️⃣ Cold Start
Cold start happens for new users or new videos.
New User
Use:
- Region
- Language
- Device
- Signup interests
- Popular local content
- Exploration
New Video
Use:
- Creator history
- Video metadata
- Audio and visual embeddings
- Small test audiences
- Early engagement signals
👉 Interview Memorization
Cold start is handled with metadata, content embeddings, popularity priors, and controlled exploration to gather early feedback.
1️⃣2️⃣ Exploration vs Exploitation
Only showing known favorites can trap the feed.
Exploitation
Show videos likely to perform well now.
Exploration
Try uncertain videos to learn user interest and evaluate new content.
Why Exploration Matters
- Discovers new interests
- Tests new videos
- Helps new creators
- Prevents filter bubbles
- Improves long-term quality
👉 Interview Memorization
Recommendation systems need exploration to learn, avoid stale feeds, and give new content a chance.
1️⃣3️⃣ Safety and Policy Filters
Recommendation quality includes safety.
Filters
- Policy violations
- Age restrictions
- Region restrictions
- Copyright restrictions
- Blocked creators
- User negative preferences
- Sensitive content limits
Where Applied
Before ranking
During reranking
Before final response
👉 Interview Memorization
Safety and policy filters must be part of the recommendation pipeline, not an afterthought.
1️⃣4️⃣ Latency Budget
Feed serving must be fast.
Example Budget
Candidate generation: 30 ms
Feature fetch: 40 ms
Ranking: 50 ms
Reranking: 20 ms
Response assembly: 10 ms
Optimization
- Parallel candidate generators
- Cached features
- Approximate nearest neighbor search
- Model distillation
- Batch inference
- Early timeout fallback
👉 Interview Memorization
Recommendation serving is latency-constrained, so candidate retrieval, feature fetching, and model inference must be optimized and often run in parallel.
1️⃣5️⃣ Training Pipeline
Models are trained from historical events.
Flow
Raw Events
↓
Clean and Join
↓
Create Labels
↓
Generate Training Examples
↓
Train Model
↓
Evaluate
↓
Deploy
Important Issues
- Delayed labels
- Position bias
- Feedback loops
- Data leakage
- Point-in-time feature correctness
- Online/offline metric mismatch
👉 Interview Memorization
Recommendation training pipelines must handle delayed labels, bias, feature correctness, and online/offline metric mismatch.
1️⃣6️⃣ Experimentation
Recommendation changes must be tested carefully.
Experiment Metrics
- Watch time
- Retention
- Completion rate
- Likes and shares
- Negative feedback
- Diversity
- Creator ecosystem health
- Long-term satisfaction
Rollout
Offline evaluation
↓
Small A/B test
↓
Ramp gradually
↓
Monitor guardrails
👉 Interview Memorization
Recommendation systems require experimentation because offline metrics do not always predict real user behavior.
1️⃣7️⃣ Failure Handling
Common Failures
- Candidate generator timeout
- Feature store latency
- Ranking model failure
- Bad model rollout
- Event pipeline lag
- Feed repetition
- Policy filter failure
Fallbacks
- Popular local videos
- Cached feed
- Simpler model
- Fewer candidate sources
- Safe default policy filters
- Roll back bad model
👉 Interview Memorization
Recommendation systems need graceful fallbacks such as cached feeds, popular content, simpler models, and model rollback.
1️⃣8️⃣ Observability
Monitor
- Feed latency
- Candidate source health
- Feature freshness
- Feature fetch latency
- Model inference latency
- Event pipeline lag
- Click/watch metrics
- Negative feedback rate
- Diversity metrics
- Cold-start performance
- A/B experiment guardrails
👉 Interview Memorization
Recommendation observability must track both system health and model quality, including feature freshness and user feedback.
1️⃣9️⃣ Trade-off Table
| Dimension | Choice | Benefit | Cost |
|---|---|---|---|
| More candidate sources | Better recall | More latency | |
| Bigger ranking model | Better accuracy | Higher serving cost | |
| Realtime features | Fresher feed | More pipeline complexity | |
| More exploration | Better learning | Short-term relevance risk | |
| More diversity | Better experience | May reduce predicted engagement | |
| Strict filters | Safer feed | Lower candidate pool |
👉 Interview Memorization
Recommendation systems constantly trade relevance, freshness, diversity, safety, latency, and compute cost.
2️⃣0️⃣ Best Practices
Practical Rules
- Use multi-stage retrieval and ranking
- Collect high-quality behavior events
- Keep online features fresh
- Separate offline training from online serving
- Use feature stores for consistency
- Add reranking for diversity and safety
- Use exploration for new users and content
- Optimize latency with parallelism and caching
- A/B test every major model change
- Monitor model quality and system health together
Design Principle
Retrieve broadly.
Rank precisely.
Learn continuously.
👉 Interview Memorization
TikTok-like recommendation pipelines win through multi-stage ranking, fresh behavior signals, fast feedback loops, and careful online experimentation.
🧠 Staff-Level Answer Final
👉 Full Interview Answer
A TikTok-like recommendation system uses a multi-stage pipeline because the content corpus is too large to rank everything directly.
The online serving path first runs multiple candidate generators, such as embedding retrieval, collaborative filtering, trending content, follow graph, language and region pools, and fresh uploads.
It then fetches user, video, and context features and applies a heavier ranking model to predict outcomes like watch time, completion, likes, shares, follows, and negative feedback.
A reranking layer adjusts the list for diversity, freshness, creator variety, exploration, and policy constraints.
The offline and realtime pipelines collect events such as impressions, watch time, skips, likes, shares, comments, reports, and replays.
These events update realtime features, feed model training, and close the feedback loop so the feed can adapt quickly.
The feature store must support both offline training and low-latency online serving while keeping feature definitions consistent.
Cold start is handled with metadata, content embeddings, popularity priors, and controlled exploration.
The main trade-off is relevance and freshness versus latency, diversity, safety, and compute cost.
⭐ Final Insight
TikTok-style Recommendation Pipeline 的核心不是:
“训练一个模型排序”
而是:
Event Collection
- Candidate Generation
- Feature Store
- Ranking
- Reranking
- Realtime Feedback
- Experimentation
最重要的一句话:
Retrieve broadly.
Rank precisely.
Learn continuously.
中文部分
🎯 How TikTok Builds Recommendation Pipelines(TikTok 风格推荐系统管线)
核心理解
短视频推荐不是简单排序。
它是一个低延迟、多阶段、强反馈循环的系统。
核心问题:
从海量视频里
快速找出少量候选
再用更强模型排序
并持续从用户反馈学习
高层架构
User opens feed
↓
Candidate Generation
↓
Feature Fetch
↓
Ranking Model
↓
Reranking / Filters
↓
Feed Response
数据管线:
User Events
↓
Event Stream
↓
Feature Pipelines
↓
Model Training
↓
Model Serving
Event Collection
推荐系统依赖用户行为:
- impression
- watch time
- completion rate
- like
- share
- comment
- follow
- skip
- not interested
- report
Watch behavior 通常是非常强的信号。
Feature Generation
常见 feature:
User Features
- 兴趣
- 历史观看
- skip
- language
- region
- follow graph
Video Features
- embedding
- topic
- creator
- freshness
- engagement rate
- safety labels
Context Features
- time of day
- session length
- current feed position
- recent interactions
Candidate Generation
不能对所有视频直接排序。
第一阶段先召回候选:
Huge Video Corpus
↓
Multiple Candidate Generators
↓
Candidate Pool
候选来源:
- embedding similarity
- collaborative filtering
- trending
- follow graph
- fresh uploads
- language / region pools
Ranking
Ranking 对候选视频进行更精细打分。
预测目标可能包括:
- watch time
- completion probability
- like probability
- share probability
- comment probability
- negative feedback probability
Reranking
模型分数最高不一定就是最终顺序。
Reranking 会考虑:
- diversity
- freshness
- creator variety
- topic variety
- safety
- exploration
- avoid repetition
Real-time Feedback
短视频推荐需要快速响应用户行为。
User watches cooking videos
↓
Realtime features update
↓
Next feed has more cooking videos
Cold Start
新用户:
- region
- language
- device
- signup interests
- local popular content
新视频:
- creator history
- video metadata
- visual/audio embedding
- small test audience
- early engagement
Exploration vs Exploitation
Exploitation:
展示当前最可能喜欢的内容
Exploration:
尝试不确定内容来学习新兴趣
没有 exploration,feed 会越来越窄。
面试回答模板
A TikTok-like recommendation system uses a multi-stage pipeline.
Candidate generation first retrieves a manageable set of videos from a huge corpus using embeddings, collaborative filtering, trending pools, follow graph, fresh uploads, and regional or language pools.
Ranking then scores candidates with richer user, video, and context features to predict watch time, completion, engagement, and negative feedback.
Reranking adjusts the final list for diversity, freshness, creator variety, exploration, and safety constraints.
The system depends on realtime event collection from impressions, watch time, skips, likes, shares, comments, and reports.
These events update features, train models, and create fast feedback loops.
The key trade-off is relevance and freshness versus latency, diversity, safety, and compute cost.
最终总结
Retrieve broadly.
Rank precisely.
Learn continuously.
核心原则:
Candidate Generation + Ranking + Reranking + Realtime Feedback
Implement