🎯 Design Recommendation System
1️⃣ Core Framework
When discussing Recommendation System design, I frame it as:
- User behavior collection
- Offline feature and model pipeline
- Candidate generation
- Ranking and re-ranking
- Online serving architecture
- Feedback loop and experimentation
- Cold start and freshness
- Trade-offs: relevance vs latency vs diversity
2️⃣ Core Requirements
Functional Requirements
- Recommend items to users
- Support personalized recommendations
- Support trending / popular recommendations
- Support real-time behavior signals
-
Support multiple surfaces:
- Home feed
- Product recommendations
- Video recommendations
- “You may also like”
-
Support feedback:
- Click
- View
- Like
- Purchase
- Hide
- Skip
- Support A/B testing
Non-functional Requirements
- Low-latency serving
- High availability
- Scalable offline training
- Near-real-time feature updates
- High relevance
- Diversity and freshness
- Explainability and monitoring
👉 Interview Answer
A recommendation system is a personalized ranking system.
It collects user behavior, generates candidate items, ranks them using user and item features, and continuously improves through feedback.
The main challenge is balancing relevance, latency, diversity, freshness, and system cost.
3️⃣ Main APIs
Get Recommendations
GET /api/recommendations?userId=u123&surface=home&limit=20
Response:
{
"items": [
{
"itemId": "i789",
"score": 0.94,
"reason": "Because you watched similar videos"
}
]
}
Track User Event
POST /api/events
Request:
{
"userId": "u123",
"itemId": "i789",
"eventType": "click",
"timestamp": "2026-05-02T10:00:00Z",
"context": {
"surface": "home",
"device": "mobile"
}
}
Feedback API
POST /api/recommendations/feedback
Request:
{
"userId": "u123",
"itemId": "i789",
"feedback": "not_interested"
}
👉 Interview Answer
I would expose a recommendation serving API and separate event tracking APIs.
Recommendation serving must be low latency, while user events can be processed asynchronously for feature updates, model training, and analytics.
4️⃣ Data Model
User Profile
user_profile (
user_id VARCHAR PRIMARY KEY,
age_group VARCHAR,
region VARCHAR,
language VARCHAR,
interests JSON,
created_at TIMESTAMP,
updated_at TIMESTAMP
)
Item Table
item (
item_id VARCHAR PRIMARY KEY,
item_type VARCHAR,
title TEXT,
category VARCHAR,
tags ARRAY,
creator_id VARCHAR,
status VARCHAR,
created_at TIMESTAMP,
metadata JSON
)
User Event Table
user_event (
event_id VARCHAR PRIMARY KEY,
user_id VARCHAR,
item_id VARCHAR,
event_type VARCHAR,
timestamp TIMESTAMP,
context JSON
)
Feature Store
feature_store (
entity_id VARCHAR,
entity_type VARCHAR,
feature_name VARCHAR,
feature_value JSON,
updated_at TIMESTAMP,
PRIMARY KEY (entity_id, entity_type, feature_name)
)
Recommendation Log
recommendation_log (
request_id VARCHAR PRIMARY KEY,
user_id VARCHAR,
item_ids ARRAY,
model_version VARCHAR,
experiment_id VARCHAR,
served_at TIMESTAMP
)
👉 Interview Answer
I would store user profiles, item metadata, user interaction events, features, and recommendation logs separately.
User events are the raw signal. Feature store provides online and offline features. Recommendation logs are important for debugging, training data generation, and A/B test analysis.
5️⃣ High-Level Architecture
User Events
→ Event Ingestion
→ Stream Processing
→ Feature Store
→ Offline Training Pipeline
→ Model Registry
Online Request
→ Recommendation Service
→ Candidate Generation
→ Feature Fetching
→ Ranking Model
→ Re-ranking
→ Response
→ Logging / Feedback Loop
Two Main Pipelines
Offline Pipeline
- Process historical data
- Build embeddings
- Train ranking models
- Generate candidate indexes
- Update feature store
Online Pipeline
- Receive recommendation request
- Fetch user context
- Generate candidates
- Rank candidates
- Apply business rules
- Return recommendations
👉 Interview Answer
I would separate recommendation into offline and online pipelines.
Offline pipelines process historical behavior, train models, generate embeddings, and compute features.
Online pipelines serve recommendations in real time using candidate generation, feature fetching, ranking, and re-ranking.
6️⃣ Candidate Generation
Goal
Reduce millions of items to a few hundred or thousand candidates.
millions of items → 500 candidates
Candidate Sources
Collaborative Filtering
Recommend items based on similar users.
users like you also liked X
Content-based Recommendation
Recommend items similar to what user liked.
similar category / tags / embeddings
Trending / Popular
Recommend globally or regionally popular items.
Social Graph
Recommend items liked or shared by friends.
Recently Viewed / Purchased
Recommend related items based on recent behavior.
Ads / Sponsored Items
May be inserted as separate candidate source.
👉 Interview Answer
Candidate generation retrieves a broad set of potentially relevant items.
I would combine multiple candidate sources, such as collaborative filtering, content-based similarity, trending items, social graph signals, and recent user behavior.
The goal is high recall, because ranking will refine the final order.
7️⃣ Embeddings and Similarity Search
User Embedding
Represents user interests.
user_id → dense vector
Item Embedding
Represents item semantics.
item_id → dense vector
Retrieval
Use approximate nearest neighbor search:
user_embedding → top similar item embeddings
Examples:
HNSW
IVF
FAISS
ScaNN
Vector DB
👉 Interview Answer
Embeddings are commonly used for recommendation retrieval.
We can represent users and items as dense vectors, then use approximate nearest neighbor search to quickly find items similar to the user’s interests.
This is useful for large-scale candidate generation.
8️⃣ Ranking
Goal
Order candidates by predicted user value.
Common Prediction Targets
- Click probability
- Watch time
- Purchase probability
- Conversion probability
- Long-term engagement
- Retention impact
Ranking Signals
- User interests
- User recent behavior
- Item category
- Item popularity
- Freshness
- Creator quality
-
Context:
- Time of day
- Device
- Location
- Negative feedback
- Business constraints
Ranking Pipeline
Candidates
→ Feature Fetching
→ Lightweight Ranking
→ ML Ranking
→ Re-ranking
→ Final List
👉 Interview Answer
Ranking predicts how valuable each candidate is for the user.
The ranking model may optimize for click probability, watch time, purchase probability, or long-term engagement.
It uses user features, item features, context features, and historical behavior signals.
9️⃣ Re-ranking
Why Re-ranking?
Pure model score may produce bad user experience.
Problems:
- Too many similar items
- Too many items from same creator
- Too many old items
- Unsafe content
- Over-optimization for clicks
- Filter bubble
Re-ranking Goals
- Diversity
- Freshness
- Safety
- Business rules
- Creator fairness
- Exploration
- Ad insertion
- Deduplication
Example
Final feed should not show 10 videos from same creator in a row.
👉 Interview Answer
Re-ranking is applied after model scoring to improve the final user experience.
It enforces diversity, freshness, safety, deduplication, and business constraints.
This prevents the system from simply showing the highest-scoring but repetitive results.
🔟 Feature Store
Why Needed?
Ranking requires many features.
Examples:
- User features
- Item features
- User-item interaction features
- Real-time session features
Offline Features
Computed from historical data.
Examples:
user_favorite_categories_30d
item_click_rate_7d
creator_quality_score
Online Features
Updated in near real time.
Examples:
recent_clicks
current_session_items
last_viewed_category
Feature Consistency
Important problem:
training-serving skew
This happens when training features and serving features are computed differently.
👉 Interview Answer
A feature store helps provide consistent features for both training and serving.
Offline features are computed from historical data, while online features capture recent user behavior.
Avoiding training-serving skew is important, because inconsistent features can hurt model quality.
1️⃣1️⃣ Online Serving Flow
Flow
User opens app
→ Recommendation request
→ Fetch user profile and context
→ Generate candidates from multiple sources
→ Fetch features
→ Rank candidates
→ Re-rank for diversity and rules
→ Return results
→ Log served recommendations
Latency Budget Example
Total budget: 100ms
Candidate generation: 30ms
Feature fetch: 30ms
Ranking: 30ms
Re-ranking + response: 10ms
👉 Interview Answer
Online serving must be low latency.
I would generate candidates from multiple sources, fetch features from a low-latency feature store, rank candidates using an online model, and then apply re-ranking rules.
The system must log what was served so we can later connect impressions to clicks and conversions.
1️⃣2️⃣ Feedback Loop
User Feedback Events
Positive signals:
- Click
- Like
- Share
- Purchase
- Long watch time
Negative signals:
- Hide
- Skip
- Short watch time
- Report
- Not interested
Feedback Pipeline
User interaction
→ Event ingestion
→ Stream processing
→ Feature updates
→ Training data generation
→ Model retraining
→ Model deployment
👉 Interview Answer
Recommendation systems rely on feedback loops.
User interactions are collected as events, processed into features, used for training data, and then fed back into future models.
Both positive and negative feedback are important for improving recommendation quality.
1️⃣3️⃣ Cold Start Problem
New User Cold Start
No behavior history.
Strategies:
- Ask onboarding preferences
- Use location / language / device
- Use trending items
- Use demographic-level recommendations
- Explore diverse content
New Item Cold Start
No engagement data.
Strategies:
- Use content metadata
- Use item embeddings
- Boost new content temporarily
- Explore with small traffic
- Use creator reputation
👉 Interview Answer
Cold start is a major challenge.
For new users, I would use onboarding preferences, location, language, and trending content.
For new items, I would use content metadata, embeddings, creator reputation, and controlled exploration traffic.
1️⃣4️⃣ Exploration vs Exploitation
Exploitation
Show items the model already believes user will like.
Pros:
- Higher short-term engagement
Cons:
- Less discovery
- Filter bubble
Exploration
Show uncertain or new items.
Pros:
- Discover new interests
- Collect training data
- Help new items
Cons:
- May reduce short-term engagement
Strategy
Use:
mostly exploitation + small exploration percentage
Example:
90% ranked items
10% exploration items
👉 Interview Answer
Recommendation systems need to balance exploration and exploitation.
Exploitation maximizes known user preferences, while exploration discovers new interests and collects data for new items.
A small percentage of the feed can be reserved for exploration.
1️⃣5️⃣ A/B Testing and Model Evaluation
Offline Metrics
- AUC
- Precision@K
- Recall@K
- NDCG
- MAP
- Loss
Online Metrics
- CTR
- Watch time
- Conversion rate
- Revenue
- Retention
- Session length
- Hide / report rate
A/B Testing Flow
User assigned to experiment
→ Recommendation service uses model variant
→ Log served items and user actions
→ Compare metrics between groups
👉 Interview Answer
Recommendation changes must be tested carefully.
Offline metrics are useful, but online A/B testing is required because user behavior and business metrics may differ from offline predictions.
I would compare engagement, conversion, retention, and negative feedback across experiment groups.
1️⃣6️⃣ Scaling Patterns
Pattern 1: Multi-stage Recommendation
Candidate generation → ranking → re-ranking
Pattern 2: Precompute Candidates
Precompute:
- Similar items
- User embeddings
- Item embeddings
- Popular items
- User candidate pools
Pattern 3: Cache Hot Features
Cache:
- User features
- Item features
- Popular item lists
- Embeddings
Pattern 4: Separate Online and Offline Systems
- Offline = batch training and feature generation
- Online = low-latency serving
Pattern 5: Real-time Feature Updates
Use stream processing for:
- Recent clicks
- Recent views
- Session behavior
- Trending items
👉 Interview Answer
To scale recommendations, I would use a multi-stage pipeline, precompute candidate pools and embeddings, cache hot features, and separate offline training from online serving.
Real-time features can be updated through stream processing to improve freshness.
1️⃣7️⃣ Failure Handling
Common Failures
- Candidate service timeout
- Feature store unavailable
- Ranking model timeout
- Model deployment issue
- Event ingestion lag
- Bad recommendations
- Cold start fallback needed
Strategies
- Fallback to popular items
- Use cached recommendations
- Skip failed candidate source
- Use simpler ranking model
- Circuit breaker around model service
- Roll back model version
- Monitor negative feedback
👉 Interview Answer
Recommendation systems should degrade gracefully.
If the ranking model is unavailable, we can fall back to cached recommendations, trending items, or a simpler ranking model.
If one candidate source fails, the system can still use other sources.
1️⃣8️⃣ Consistency Model
Stronger Consistency Needed For
- User privacy settings
- Blocked content
- Removed items
- Safety policy decisions
- Experiment assignment
Eventual Consistency Acceptable For
- User interest updates
- Ranking features
- Popularity counters
- Embeddings
- Training data
- Analytics
👉 Interview Answer
Most recommendation features can be eventually consistent.
It is acceptable if a user’s latest click updates recommendations a few seconds later.
But privacy settings, blocked content, removed items, and safety rules must be enforced correctly at serving time.
1️⃣9️⃣ Observability
System Metrics
- Recommendation API latency
- Candidate generation latency
- Feature store latency
- Ranking latency
- Error rate
- Cache hit rate
- Fallback rate
- Model timeout rate
Quality Metrics
- CTR
- Conversion rate
- Watch time
- Retention
- Diversity
- Freshness
- Negative feedback rate
- Coverage
- Novelty
👉 Interview Answer
I would monitor both system health and recommendation quality.
System metrics include latency, error rate, feature store performance, and fallback rate.
Quality metrics include CTR, conversion, watch time, diversity, freshness, and negative feedback.
2️⃣0️⃣ End-to-End Flow
Offline Training Flow
User events collected
→ Data pipeline cleans events
→ Generate training examples
→ Compute features
→ Train model
→ Evaluate model
→ Register model
→ Deploy model
Online Serving Flow
User opens app
→ Recommendation service receives request
→ Generate candidates
→ Fetch features
→ Rank candidates
→ Re-rank for diversity and safety
→ Return recommendations
→ Log impression
Feedback Flow
User interacts with item
→ Event logged
→ Stream updates real-time features
→ Batch pipeline updates training data
→ Future recommendations improve
Key Insight
Recommendation System is not just a model — it is a feedback-driven ranking platform.
🧠 Staff-Level Answer (Final)
👉 Interview Answer (Full Version)
When designing a recommendation system, I think of it as a feedback-driven personalized ranking platform.
The system has two main pipelines: an offline pipeline for training and feature generation, and an online pipeline for low-latency serving.
The offline pipeline collects user events, generates training data, computes user and item features, trains models, evaluates them, and deploys model versions.
The online pipeline receives a recommendation request, generates candidates from multiple sources, fetches features, ranks candidates using a model, and then re-ranks results for diversity, freshness, safety, and business rules.
Candidate generation is optimized for recall. It may use collaborative filtering, content-based similarity, embeddings, trending items, social graph signals, and recent user behavior.
Ranking is optimized for precision and predicts objectives such as click probability, watch time, purchase probability, or long-term engagement.
A feature store is important to provide consistent features for both training and serving and to avoid training-serving skew.
The system must handle cold start. For new users, I would use onboarding preferences, location, language, and trending items. For new items, I would use metadata, embeddings, creator reputation, and controlled exploration.
I would continuously evaluate models using offline metrics and online A/B testing, because offline performance does not always translate to better user behavior.
The main trade-offs are relevance, latency, diversity, freshness, exploration, and system cost.
Ultimately, the goal is to recommend the right item to the right user at the right time, while continuously learning from feedback.
⭐ Final Insight
Recommendation System 的核心不是一个模型, 而是一个由 candidate generation、ranking、feedback loop 和 A/B testing 组成的持续学习系统。
中文部分
🎯 Design Recommendation System
1️⃣ 核心框架
在设计 Recommendation System 时,我通常从以下几个方面来分析:
- 用户行为收集
- Offline feature 和 model pipeline
- Candidate generation
- Ranking 和 re-ranking
- Online serving architecture
- Feedback loop 和 experimentation
- Cold start 和 freshness
- 核心权衡:relevance vs latency vs diversity
2️⃣ 核心需求
功能需求
- 向用户推荐 items
- 支持个性化推荐
- 支持 trending / popular recommendations
- 支持实时行为信号
-
支持多个推荐场景:
- Home feed
- Product recommendations
- Video recommendations
- “You may also like”
-
支持用户反馈:
- Click
- View
- Like
- Purchase
- Hide
- Skip
- 支持 A/B testing
非功能需求
- 低延迟 serving
- 高可用
- 可扩展 offline training
- 近实时 feature updates
- 高相关性
- 多样性和新鲜度
- 可解释性和监控
👉 面试回答
Recommendation System 是一个个性化 ranking system。
它收集用户行为, 生成候选 items, 使用 user 和 item features 对候选结果排序, 并通过 feedback 持续优化。
核心挑战是在 relevance、latency、 diversity、freshness 和 system cost 之间做平衡。
3️⃣ 主要 API
Get Recommendations
GET /api/recommendations?userId=u123&surface=home&limit=20
Response:
{
"items": [
{
"itemId": "i789",
"score": 0.94,
"reason": "Because you watched similar videos"
}
]
}
Track User Event
POST /api/events
Request:
{
"userId": "u123",
"itemId": "i789",
"eventType": "click",
"timestamp": "2026-05-02T10:00:00Z",
"context": {
"surface": "home",
"device": "mobile"
}
}
Feedback API
POST /api/recommendations/feedback
Request:
{
"userId": "u123",
"itemId": "i789",
"feedback": "not_interested"
}
👉 面试回答
我会提供 recommendation serving API, 并将 event tracking APIs 单独拆开。
Recommendation serving 必须低延迟; 用户事件可以异步处理, 用于 feature update、model training 和 analytics。
4️⃣ 数据模型
User Profile
user_profile (
user_id VARCHAR PRIMARY KEY,
age_group VARCHAR,
region VARCHAR,
language VARCHAR,
interests JSON,
created_at TIMESTAMP,
updated_at TIMESTAMP
)
Item Table
item (
item_id VARCHAR PRIMARY KEY,
item_type VARCHAR,
title TEXT,
category VARCHAR,
tags ARRAY,
creator_id VARCHAR,
status VARCHAR,
created_at TIMESTAMP,
metadata JSON
)
User Event Table
user_event (
event_id VARCHAR PRIMARY KEY,
user_id VARCHAR,
item_id VARCHAR,
event_type VARCHAR,
timestamp TIMESTAMP,
context JSON
)
Feature Store
feature_store (
entity_id VARCHAR,
entity_type VARCHAR,
feature_name VARCHAR,
feature_value JSON,
updated_at TIMESTAMP,
PRIMARY KEY (entity_id, entity_type, feature_name)
)
Recommendation Log
recommendation_log (
request_id VARCHAR PRIMARY KEY,
user_id VARCHAR,
item_ids ARRAY,
model_version VARCHAR,
experiment_id VARCHAR,
served_at TIMESTAMP
)
👉 面试回答
我会将 user profiles、item metadata、 user interaction events、features 和 recommendation logs 分开存储。
User events 是原始信号。 Feature store 提供 online 和 offline features。 Recommendation logs 对 debugging、training data generation 和 A/B test analysis 非常重要。
5️⃣ High-Level Architecture
User Events
→ Event Ingestion
→ Stream Processing
→ Feature Store
→ Offline Training Pipeline
→ Model Registry
Online Request
→ Recommendation Service
→ Candidate Generation
→ Feature Fetching
→ Ranking Model
→ Re-ranking
→ Response
→ Logging / Feedback Loop
Two Main Pipelines
Offline Pipeline
- 处理历史数据
- 构建 embeddings
- 训练 ranking models
- 生成 candidate indexes
- 更新 feature store
Online Pipeline
- 接收 recommendation request
- 获取 user context
- 生成 candidates
- 对 candidates 排序
- 应用 business rules
- 返回 recommendations
👉 面试回答
我会将 recommendation system 拆成 offline pipeline 和 online pipeline。
Offline pipeline 处理历史行为、训练模型、 生成 embeddings,并计算 features。
Online pipeline 则实时服务请求, 包括 candidate generation、feature fetching、 ranking 和 re-ranking。
6️⃣ Candidate Generation
目标
将数百万 items 缩小到几百或几千个 candidates。
millions of items → 500 candidates
Candidate Sources
Collaborative Filtering
基于相似用户推荐。
users like you also liked X
Content-based Recommendation
推荐和用户喜欢内容相似的 items。
similar category / tags / embeddings
Trending / Popular
推荐全局或区域热门 items。
Social Graph
推荐朋友 liked 或 shared 的 items。
Recently Viewed / Purchased
根据最近行为推荐相关 items。
Ads / Sponsored Items
可以作为独立 candidate source 插入。
👉 面试回答
Candidate generation 会召回一批可能相关的 items。
我会结合多个 candidate sources, 例如 collaborative filtering、content-based similarity、 trending items、social graph signals 和最近用户行为。
这一阶段目标是高召回, 后续 ranking 会负责精排。
7️⃣ Embeddings and Similarity Search
User Embedding
表示用户兴趣。
user_id → dense vector
Item Embedding
表示 item 语义。
item_id → dense vector
Retrieval
使用 approximate nearest neighbor search:
user_embedding → top similar item embeddings
例如:
HNSW
IVF
FAISS
ScaNN
Vector DB
👉 面试回答
Embeddings 常用于 recommendation retrieval。
我们可以将 users 和 items 表示成 dense vectors, 然后使用 approximate nearest neighbor search 快速找到与用户兴趣相似的 items。
这对于大规模 candidate generation 很有用。
8️⃣ Ranking
目标
按照预测用户价值对 candidates 排序。
常见预测目标
- Click probability
- Watch time
- Purchase probability
- Conversion probability
- Long-term engagement
- Retention impact
Ranking Signals
- User interests
- User recent behavior
- Item category
- Item popularity
- Freshness
- Creator quality
-
Context:
- Time of day
- Device
- Location
- Negative feedback
- Business constraints
Ranking Pipeline
Candidates
→ Feature Fetching
→ Lightweight Ranking
→ ML Ranking
→ Re-ranking
→ Final List
👉 面试回答
Ranking 会预测每个 candidate 对用户的价值。
Ranking model 可以优化 click probability、watch time、 purchase probability 或 long-term engagement。
它会使用 user features、item features、 context features 和历史行为信号。
9️⃣ Re-ranking
为什么需要 Re-ranking?
纯模型分数可能带来不好的用户体验。
问题:
- 太多相似 items
- 太多来自同一个 creator 的内容
- 太多旧内容
- Unsafe content
- 过度优化 clicks
- Filter bubble
Re-ranking 目标
- Diversity
- Freshness
- Safety
- Business rules
- Creator fairness
- Exploration
- Ad insertion
- Deduplication
示例
Final feed should not show 10 videos from same creator in a row.
👉 面试回答
Re-ranking 在模型打分之后执行, 用来改善最终用户体验。
它会强制执行 diversity、freshness、safety、 deduplication 和 business constraints。
这样可以避免系统只展示分数最高但高度重复的结果。
🔟 Feature Store
为什么需要?
Ranking 需要大量 features。
例如:
- User features
- Item features
- User-item interaction features
- Real-time session features
Offline Features
从历史数据计算。
示例:
user_favorite_categories_30d
item_click_rate_7d
creator_quality_score
Online Features
近实时更新。
示例:
recent_clicks
current_session_items
last_viewed_category
Feature Consistency
重要问题:
training-serving skew
当 training features 和 serving features 计算方式不一致时发生。
👉 面试回答
Feature store 可以为 training 和 serving 提供一致的 features。
Offline features 基于历史数据计算, online features 捕获最近用户行为。
避免 training-serving skew 非常重要, 因为 feature 不一致会伤害模型质量。
1️⃣1️⃣ Online Serving Flow
Flow
User opens app
→ Recommendation request
→ Fetch user profile and context
→ Generate candidates from multiple sources
→ Fetch features
→ Rank candidates
→ Re-rank for diversity and rules
→ Return results
→ Log served recommendations
Latency Budget Example
Total budget: 100ms
Candidate generation: 30ms
Feature fetch: 30ms
Ranking: 30ms
Re-ranking + response: 10ms
👉 面试回答
Online serving 必须低延迟。
我会从多个 sources 生成 candidates, 从低延迟 feature store 获取 features, 使用 online model 排序 candidates, 然后应用 re-ranking rules。
系统必须记录 served recommendations, 这样之后才能将 impressions 和 clicks/conversions 关联起来。
1️⃣2️⃣ Feedback Loop
User Feedback Events
正向信号:
- Click
- Like
- Share
- Purchase
- Long watch time
负向信号:
- Hide
- Skip
- Short watch time
- Report
- Not interested
Feedback Pipeline
User interaction
→ Event ingestion
→ Stream processing
→ Feature updates
→ Training data generation
→ Model retraining
→ Model deployment
👉 面试回答
Recommendation system 依赖 feedback loop。
用户行为会作为 events 被收集, 处理成 features, 用于训练数据, 再反馈到未来模型中。
正向和负向反馈都很重要, 都可以帮助提升推荐质量。
1️⃣3️⃣ Cold Start Problem
New User Cold Start
没有行为历史。
策略:
- Ask onboarding preferences
- 使用 location / language / device
- 使用 trending items
- 使用 demographic-level recommendations
- 探索多样内容
New Item Cold Start
没有 engagement data。
策略:
- 使用 content metadata
- 使用 item embeddings
- 临时 boost new content
- 小流量探索
- 使用 creator reputation
👉 面试回答
Cold start 是 recommendation system 的主要挑战。
对于新用户, 我会使用 onboarding preferences、location、 language 和 trending content。
对于新 item, 我会使用 content metadata、embeddings、 creator reputation 和 controlled exploration traffic。
1️⃣4️⃣ Exploration vs Exploitation
Exploitation
展示模型已经确定用户会喜欢的 items。
优点:
- 短期 engagement 更高
缺点:
- 发现性较差
- 容易形成 filter bubble
Exploration
展示不确定或新的 items。
优点:
- 发现新兴趣
- 收集训练数据
- 帮助新 items
缺点:
- 可能降低短期 engagement
Strategy
使用:
mostly exploitation + small exploration percentage
示例:
90% ranked items
10% exploration items
👉 面试回答
Recommendation system 需要平衡 exploration 和 exploitation。
Exploitation 会最大化已知用户偏好; exploration 可以发现新兴趣, 并为新 items 收集数据。
Feed 中可以预留一小部分比例用于 exploration。
1️⃣5️⃣ A/B Testing and Model Evaluation
Offline Metrics
- AUC
- Precision@K
- Recall@K
- NDCG
- MAP
- Loss
Online Metrics
- CTR
- Watch time
- Conversion rate
- Revenue
- Retention
- Session length
- Hide / report rate
A/B Testing Flow
User assigned to experiment
→ Recommendation service uses model variant
→ Log served items and user actions
→ Compare metrics between groups
👉 面试回答
Recommendation changes 必须谨慎测试。
Offline metrics 很有用, 但 online A/B testing 是必须的, 因为离线指标好不一定代表真实用户行为变好。
我会比较 engagement、conversion、retention 和 negative feedback 等指标。
1️⃣6️⃣ Scaling Patterns
Pattern 1: Multi-stage Recommendation
Candidate generation → ranking → re-ranking
Pattern 2: Precompute Candidates
预计算:
- Similar items
- User embeddings
- Item embeddings
- Popular items
- User candidate pools
Pattern 3: Cache Hot Features
缓存:
- User features
- Item features
- Popular item lists
- Embeddings
Pattern 4: Separate Online and Offline Systems
- Offline = batch training and feature generation
- Online = low-latency serving
Pattern 5: Real-time Feature Updates
使用 stream processing 更新:
- Recent clicks
- Recent views
- Session behavior
- Trending items
👉 面试回答
为了扩展 recommendations, 我会使用多阶段 pipeline, 预计算 candidate pools 和 embeddings, 缓存热点 features, 并将 offline training 和 online serving 分离。
Real-time features 可以通过 stream processing 更新, 用来提升新鲜度。
1️⃣7️⃣ Failure Handling
Common Failures
- Candidate service timeout
- Feature store unavailable
- Ranking model timeout
- Model deployment issue
- Event ingestion lag
- Bad recommendations
- Cold start fallback needed
Strategies
- Fallback to popular items
- Use cached recommendations
- Skip failed candidate source
- Use simpler ranking model
- Circuit breaker around model service
- Roll back model version
- Monitor negative feedback
👉 面试回答
Recommendation system 应该支持优雅降级。
如果 ranking model 不可用, 可以回退到 cached recommendations、 trending items 或简单 ranking model。
如果某个 candidate source 失败, 系统仍然可以使用其他 sources。
1️⃣8️⃣ Consistency Model
需要较强一致性的场景
- User privacy settings
- Blocked content
- Removed items
- Safety policy decisions
- Experiment assignment
可以最终一致的场景
- User interest updates
- Ranking features
- Popularity counters
- Embeddings
- Training data
- Analytics
👉 面试回答
大多数 recommendation features 可以最终一致。
用户最新点击几秒后影响推荐, 通常是可以接受的。
但是 privacy settings、blocked content、 removed items 和 safety rules 必须在 serving time 正确执行。
1️⃣9️⃣ Observability
System Metrics
- Recommendation API latency
- Candidate generation latency
- Feature store latency
- Ranking latency
- Error rate
- Cache hit rate
- Fallback rate
- Model timeout rate
Quality Metrics
- CTR
- Conversion rate
- Watch time
- Retention
- Diversity
- Freshness
- Negative feedback rate
- Coverage
- Novelty
👉 面试回答
我会同时监控 system health 和 recommendation quality。
System metrics 包括 latency、error rate、 feature store performance 和 fallback rate。
Quality metrics 包括 CTR、conversion、watch time、 diversity、freshness 和 negative feedback。
2️⃣0️⃣ End-to-End Flow
Offline Training Flow
User events collected
→ Data pipeline cleans events
→ Generate training examples
→ Compute features
→ Train model
→ Evaluate model
→ Register model
→ Deploy model
Online Serving Flow
User opens app
→ Recommendation service receives request
→ Generate candidates
→ Fetch features
→ Rank candidates
→ Re-rank for diversity and safety
→ Return recommendations
→ Log impression
Feedback Flow
User interacts with item
→ Event logged
→ Stream updates real-time features
→ Batch pipeline updates training data
→ Future recommendations improve
Key Insight
Recommendation System 不是一个模型, 而是 feedback-driven ranking platform。
🧠 Staff-Level Answer(最终版)
👉 面试回答(完整背诵版)
在设计 Recommendation System 时, 我会把它看作一个 feedback-driven personalized ranking platform。
系统有两个主要 pipeline: 一个是用于训练和 feature generation 的 offline pipeline, 另一个是用于低延迟 serving 的 online pipeline。
Offline pipeline 收集用户事件, 生成训练数据, 计算 user 和 item features, 训练模型, 评估模型, 并部署模型版本。
Online pipeline 接收 recommendation request, 从多个来源生成 candidates, 获取 features, 使用模型对 candidates 排序, 最后根据 diversity、freshness、safety 和 business rules 进行 re-ranking。
Candidate generation 优化的是 recall。 它可以使用 collaborative filtering、 content-based similarity、embeddings、 trending items、social graph signals 和最近用户行为。
Ranking 优化的是 precision, 预测 click probability、watch time、 purchase probability 或 long-term engagement。
Feature store 很重要, 因为它为 training 和 serving 提供一致 features, 并避免 training-serving skew。
系统必须处理 cold start。 对于新用户, 我会使用 onboarding preferences、location、 language 和 trending items。 对于新 items, 我会使用 metadata、embeddings、 creator reputation 和 controlled exploration。
我会持续使用 offline metrics 和 online A/B testing 来评估模型, 因为 offline performance 不一定能转化成真实用户行为提升。
核心权衡包括 relevance、latency、diversity、 freshness、exploration 和 system cost。
最终目标是在合适的时间, 向合适的用户推荐合适的 item, 并通过 feedback 持续学习和优化。
⭐ Final Insight
Recommendation System 的核心不是一个模型, 而是一个由 candidate generation、ranking、feedback loop 和 A/B testing 组成的持续学习系统。
Implement