sd-rps Real Production Systems ·

🎯 How TikTok Builds Recommendation Pipelines

1️⃣ Core Framework

When discussing TikTok-style Recommendation Pipelines, I frame it as:

Event collection
Feature generation
Candidate generation
Ranking
Reranking and policy filters
Real-time feedback loops
Exploration vs exploitation
Trade-offs: relevance vs freshness vs latency vs diversity

2️⃣ The Core Problem

A short-video recommendation system must choose a small set of videos from a massive content pool within a tight latency budget.

Challenges

Huge video corpus
Rapidly changing user interests
New content cold start
Low latency serving
Feedback loops
Creator fairness
Safety and policy constraints
Avoiding repetitive feeds

👉 Interview Memorization

A TikTok-like recommendation system is a low-latency multi-stage ranking system driven by fast feedback loops from user behavior.

3️⃣ High-level Architecture

Online Serving Path

User opens feed

↓

Candidate Generation

↓

Feature Fetch

↓

Ranking Model

↓

Reranking / Filters

↓

Feed Response

Offline and Realtime Pipelines

User Events

↓

Event Stream

↓

Feature Pipelines

↓

Model Training

↓

Model Serving

👉 Interview Memorization

Recommendation systems separate online low-latency serving from offline and realtime pipelines that generate features and train models.

4️⃣ Event Collection

Recommendations improve from behavior data.

Common Events

Impression
Watch time
Completion rate
Like
Share
Comment
Follow
Skip
Not interested
Report
Replay

Event Flow

Client Event

↓

Event Collector

↓

Stream Platform

↓

Feature Store / Data Lake

👉 Interview Memorization

Short-video recommendations depend heavily on high-quality event collection because watch behavior is the strongest feedback signal.

5️⃣ Feature Generation

Features describe users, videos, context, and sessions.

User Features

Interests
Watch history
Skips
Language
Region
Device
Follow graph
Long-term preferences

Video Features

Embedding
Topic
Language
Creator
Freshness
Engagement rate
Safety labels
Audio or visual signals

Context Features

Time of day
Network
Session length
Current feed position
Recent interactions

👉 Interview Memorization

Recommendation quality depends on fresh user, item, and context features that capture both long-term preference and current session intent.

6️⃣ Candidate Generation

Candidate generation narrows millions of videos down to hundreds or thousands.

Candidate Sources

Similar video embeddings
User interest clusters
Follow graph
Trending videos
Fresh uploads
Geo or language pools
Collaborative filtering
Creator affinity

Architecture

Huge Video Corpus

↓

Multiple Candidate Generators

↓

Candidate Pool

👉 Interview Memorization

Candidate generation optimizes recall by quickly retrieving a manageable set of possibly relevant videos from a huge corpus.

7️⃣ Ranking

Ranking scores each candidate with richer models and features.

Predicted Objectives

Watch time
Completion probability
Like probability
Share probability
Comment probability
Follow probability
Negative feedback probability
Long-term satisfaction

Ranking Flow

Candidate Videos

↓

Fetch Features

↓

Model Scores

↓

Sorted Candidates

👉 Interview Memorization

Ranking spends more compute on a smaller candidate set to estimate engagement, satisfaction, and risk signals more accurately.

8️⃣ Reranking

The top model scores are not always the final feed order.

Reranking Goals

Diversity
Freshness
Creator variety
Topic variety
Safety constraints
Business rules
Avoid repeated content
Exploration quota

Example

Top 10 are all same topic

↓

Reranker mixes topics and creators

👉 Interview Memorization

Reranking adjusts the model-sorted list to improve diversity, safety, freshness, and overall feed experience.

9️⃣ Real-time Feedback Loop

Short-video systems react quickly to behavior.

Example

User watches cooking videos to completion

↓

Realtime features update

↓

Next feed request includes more cooking candidates

Fresh Signals

Recent watch completions
Recent skips
Session topic interest
Last few interactions
Short-term embedding updates

👉 Interview Memorization

Real-time feedback loops let the feed adapt within a session instead of waiting for offline model retraining.

🔟 Feature Store

Feature stores serve both training and online inference.

Two Views

Offline Feature Store
→ training data

Online Feature Store
→ low-latency serving

Requirements

Low-latency reads
Fresh updates
Backfill support
Consistent definitions
Point-in-time correctness for training

👉 Interview Memorization

Feature stores keep training and serving features consistent while supporting low-latency online reads and large-scale offline training.

1️⃣1️⃣ Cold Start

Cold start happens for new users or new videos.

New User

Use:

Region
Language
Device
Signup interests
Popular local content
Exploration

New Video

Use:

Creator history
Video metadata
Audio and visual embeddings
Small test audiences
Early engagement signals

👉 Interview Memorization

Cold start is handled with metadata, content embeddings, popularity priors, and controlled exploration to gather early feedback.

1️⃣2️⃣ Exploration vs Exploitation

Only showing known favorites can trap the feed.

Exploitation

Show videos likely to perform well now.

Exploration

Try uncertain videos to learn user interest and evaluate new content.

Why Exploration Matters

Discovers new interests
Tests new videos
Helps new creators
Prevents filter bubbles
Improves long-term quality

👉 Interview Memorization

Recommendation systems need exploration to learn, avoid stale feeds, and give new content a chance.

1️⃣3️⃣ Safety and Policy Filters

Recommendation quality includes safety.

Filters

Policy violations
Age restrictions
Region restrictions
Copyright restrictions
Blocked creators
User negative preferences
Sensitive content limits

Where Applied

Before ranking

During reranking

Before final response

👉 Interview Memorization

Safety and policy filters must be part of the recommendation pipeline, not an afterthought.

1️⃣4️⃣ Latency Budget

Feed serving must be fast.

Example Budget

Candidate generation: 30 ms

Feature fetch: 40 ms

Ranking: 50 ms

Reranking: 20 ms

Response assembly: 10 ms

Optimization

Parallel candidate generators
Cached features
Approximate nearest neighbor search
Model distillation
Batch inference
Early timeout fallback

👉 Interview Memorization

Recommendation serving is latency-constrained, so candidate retrieval, feature fetching, and model inference must be optimized and often run in parallel.

1️⃣5️⃣ Training Pipeline

Models are trained from historical events.

Flow

Raw Events

↓

Clean and Join

↓

Create Labels

↓

Generate Training Examples

↓

Train Model

↓

Evaluate

↓

Deploy

Important Issues

Delayed labels
Position bias
Feedback loops
Data leakage
Point-in-time feature correctness
Online/offline metric mismatch

👉 Interview Memorization

Recommendation training pipelines must handle delayed labels, bias, feature correctness, and online/offline metric mismatch.

1️⃣6️⃣ Experimentation

Recommendation changes must be tested carefully.

Experiment Metrics

Watch time
Retention
Completion rate
Likes and shares
Negative feedback
Diversity
Creator ecosystem health
Long-term satisfaction

Rollout

Offline evaluation

↓

Small A/B test

↓

Ramp gradually

↓

Monitor guardrails

👉 Interview Memorization

Recommendation systems require experimentation because offline metrics do not always predict real user behavior.

1️⃣7️⃣ Failure Handling

Common Failures

Candidate generator timeout
Feature store latency
Ranking model failure
Bad model rollout
Event pipeline lag
Feed repetition
Policy filter failure

Fallbacks

Popular local videos
Cached feed
Simpler model
Fewer candidate sources
Safe default policy filters
Roll back bad model

👉 Interview Memorization

Recommendation systems need graceful fallbacks such as cached feeds, popular content, simpler models, and model rollback.

1️⃣8️⃣ Observability

Monitor

Feed latency
Candidate source health
Feature freshness
Feature fetch latency
Model inference latency
Event pipeline lag
Click/watch metrics
Negative feedback rate
Diversity metrics
Cold-start performance
A/B experiment guardrails

👉 Interview Memorization

Recommendation observability must track both system health and model quality, including feature freshness and user feedback.

1️⃣9️⃣ Trade-off Table

Dimension	Choice	Benefit
More candidate sources	Better recall	More latency
Bigger ranking model	Better accuracy	Higher serving cost
Realtime features	Fresher feed	More pipeline complexity
More exploration	Better learning	Short-term relevance risk
More diversity	Better experience	May reduce predicted engagement
Strict filters	Safer feed	Lower candidate pool

👉 Interview Memorization

Recommendation systems constantly trade relevance, freshness, diversity, safety, latency, and compute cost.

2️⃣0️⃣ Best Practices

Practical Rules

Use multi-stage retrieval and ranking
Collect high-quality behavior events
Keep online features fresh
Separate offline training from online serving
Use feature stores for consistency
Add reranking for diversity and safety
Use exploration for new users and content
Optimize latency with parallelism and caching
A/B test every major model change
Monitor model quality and system health together

Design Principle

Retrieve broadly.

Rank precisely.

Learn continuously.

👉 Interview Memorization

TikTok-like recommendation pipelines win through multi-stage ranking, fresh behavior signals, fast feedback loops, and careful online experimentation.

🧠 Staff-Level Answer Final

👉 Full Interview Answer

A TikTok-like recommendation system uses a multi-stage pipeline because the content corpus is too large to rank everything directly.

The online serving path first runs multiple candidate generators, such as embedding retrieval, collaborative filtering, trending content, follow graph, language and region pools, and fresh uploads.

It then fetches user, video, and context features and applies a heavier ranking model to predict outcomes like watch time, completion, likes, shares, follows, and negative feedback.

A reranking layer adjusts the list for diversity, freshness, creator variety, exploration, and policy constraints.

The offline and realtime pipelines collect events such as impressions, watch time, skips, likes, shares, comments, reports, and replays.

These events update realtime features, feed model training, and close the feedback loop so the feed can adapt quickly.

The feature store must support both offline training and low-latency online serving while keeping feature definitions consistent.

Cold start is handled with metadata, content embeddings, popularity priors, and controlled exploration.

The main trade-off is relevance and freshness versus latency, diversity, safety, and compute cost.

⭐ Final Insight

TikTok-style Recommendation Pipeline 的核心不是：

“训练一个模型排序”

而是：

Event Collection

Candidate Generation

Feature Store

Ranking

Reranking

Realtime Feedback

Experimentation

最重要的一句话：

Retrieve broadly.

Rank precisely.

Learn continuously.

中文部分

🎯 How TikTok Builds Recommendation Pipelines（TikTok 风格推荐系统管线）

核心理解

短视频推荐不是简单排序。

它是一个低延迟、多阶段、强反馈循环的系统。

核心问题：

从海量视频里

快速找出少量候选

再用更强模型排序

并持续从用户反馈学习

高层架构

User opens feed

↓

Candidate Generation

↓

Feature Fetch

↓

Ranking Model

↓

Reranking / Filters

↓

Feed Response

数据管线：

User Events

↓

Event Stream

↓

Feature Pipelines

↓

Model Training

↓

Model Serving

Event Collection

推荐系统依赖用户行为：

impression
watch time
completion rate
like
share
comment
follow
skip
not interested
report

Watch behavior 通常是非常强的信号。

Feature Generation

常见 feature：

User Features

兴趣
历史观看
skip
language
region
follow graph

Video Features

embedding
topic
creator
freshness
engagement rate
safety labels

Context Features

time of day
session length
current feed position
recent interactions

Candidate Generation

不能对所有视频直接排序。

第一阶段先召回候选：

Huge Video Corpus

↓

Multiple Candidate Generators

↓

Candidate Pool

候选来源：

embedding similarity
collaborative filtering
trending
follow graph
fresh uploads
language / region pools

Ranking

Ranking 对候选视频进行更精细打分。

预测目标可能包括：

watch time
completion probability
like probability
share probability
comment probability
negative feedback probability

Reranking

模型分数最高不一定就是最终顺序。

Reranking 会考虑：

diversity
freshness
creator variety
topic variety
safety
exploration
avoid repetition

Real-time Feedback

短视频推荐需要快速响应用户行为。

User watches cooking videos

↓

Realtime features update

↓

Next feed has more cooking videos

Cold Start

新用户：

region
language
device
signup interests
local popular content

新视频：

creator history
video metadata
visual/audio embedding
small test audience
early engagement

Exploration vs Exploitation

Exploitation：

展示当前最可能喜欢的内容

Exploration：

尝试不确定内容来学习新兴趣

没有 exploration，feed 会越来越窄。

面试回答模板

A TikTok-like recommendation system uses a multi-stage pipeline.

Candidate generation first retrieves a manageable set of videos from a huge corpus using embeddings, collaborative filtering, trending pools, follow graph, fresh uploads, and regional or language pools.

Ranking then scores candidates with richer user, video, and context features to predict watch time, completion, engagement, and negative feedback.

Reranking adjusts the final list for diversity, freshness, creator variety, exploration, and safety constraints.

The system depends on realtime event collection from impressions, watch time, skips, likes, shares, comments, and reports.

These events update features, train models, and create fast feedback loops.

The key trade-off is relevance and freshness versus latency, diversity, safety, and compute cost.

最终总结

Retrieve broadly.

Rank precisely.

Learn continuously.

核心原则：

Candidate Generation + Ranking + Reranking + Realtime Feedback