System Design Deep Dive - 16 Design Recommendation System

Post by ailswan May. 09, 2026

中文 ↓

🎯 Design Recommendation System

1️⃣ Core Framework

When discussing Recommendation System design, I frame it as:

  1. User behavior collection
  2. Offline feature and model pipeline
  3. Candidate generation
  4. Ranking and re-ranking
  5. Online serving architecture
  6. Feedback loop and experimentation
  7. Cold start and freshness
  8. Trade-offs: relevance vs latency vs diversity

2️⃣ Core Requirements


Functional Requirements


Non-functional Requirements


👉 Interview Answer

A recommendation system is a personalized ranking system.

It collects user behavior, generates candidate items, ranks them using user and item features, and continuously improves through feedback.

The main challenge is balancing relevance, latency, diversity, freshness, and system cost.


3️⃣ Main APIs


Get Recommendations

GET /api/recommendations?userId=u123&surface=home&limit=20

Response:

{
  "items": [
    {
      "itemId": "i789",
      "score": 0.94,
      "reason": "Because you watched similar videos"
    }
  ]
}

Track User Event

POST /api/events

Request:

{
  "userId": "u123",
  "itemId": "i789",
  "eventType": "click",
  "timestamp": "2026-05-02T10:00:00Z",
  "context": {
    "surface": "home",
    "device": "mobile"
  }
}

Feedback API

POST /api/recommendations/feedback

Request:

{
  "userId": "u123",
  "itemId": "i789",
  "feedback": "not_interested"
}

👉 Interview Answer

I would expose a recommendation serving API and separate event tracking APIs.

Recommendation serving must be low latency, while user events can be processed asynchronously for feature updates, model training, and analytics.


4️⃣ Data Model


User Profile

user_profile (
  user_id VARCHAR PRIMARY KEY,
  age_group VARCHAR,
  region VARCHAR,
  language VARCHAR,
  interests JSON,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
)

Item Table

item (
  item_id VARCHAR PRIMARY KEY,
  item_type VARCHAR,
  title TEXT,
  category VARCHAR,
  tags ARRAY,
  creator_id VARCHAR,
  status VARCHAR,
  created_at TIMESTAMP,
  metadata JSON
)

User Event Table

user_event (
  event_id VARCHAR PRIMARY KEY,
  user_id VARCHAR,
  item_id VARCHAR,
  event_type VARCHAR,
  timestamp TIMESTAMP,
  context JSON
)

Feature Store

feature_store (
  entity_id VARCHAR,
  entity_type VARCHAR,
  feature_name VARCHAR,
  feature_value JSON,
  updated_at TIMESTAMP,
  PRIMARY KEY (entity_id, entity_type, feature_name)
)

Recommendation Log

recommendation_log (
  request_id VARCHAR PRIMARY KEY,
  user_id VARCHAR,
  item_ids ARRAY,
  model_version VARCHAR,
  experiment_id VARCHAR,
  served_at TIMESTAMP
)

👉 Interview Answer

I would store user profiles, item metadata, user interaction events, features, and recommendation logs separately.

User events are the raw signal. Feature store provides online and offline features. Recommendation logs are important for debugging, training data generation, and A/B test analysis.


5️⃣ High-Level Architecture


User Events
→ Event Ingestion
→ Stream Processing
→ Feature Store
→ Offline Training Pipeline
→ Model Registry

Online Request
→ Recommendation Service
→ Candidate Generation
→ Feature Fetching
→ Ranking Model
→ Re-ranking
→ Response
→ Logging / Feedback Loop

Two Main Pipelines

Offline Pipeline


Online Pipeline


👉 Interview Answer

I would separate recommendation into offline and online pipelines.

Offline pipelines process historical behavior, train models, generate embeddings, and compute features.

Online pipelines serve recommendations in real time using candidate generation, feature fetching, ranking, and re-ranking.


6️⃣ Candidate Generation


Goal

Reduce millions of items to a few hundred or thousand candidates.

millions of items → 500 candidates

Candidate Sources

Collaborative Filtering

Recommend items based on similar users.

users like you also liked X

Content-based Recommendation

Recommend items similar to what user liked.

similar category / tags / embeddings

Recommend globally or regionally popular items.


Social Graph

Recommend items liked or shared by friends.


Recently Viewed / Purchased

Recommend related items based on recent behavior.


Ads / Sponsored Items

May be inserted as separate candidate source.


👉 Interview Answer

Candidate generation retrieves a broad set of potentially relevant items.

I would combine multiple candidate sources, such as collaborative filtering, content-based similarity, trending items, social graph signals, and recent user behavior.

The goal is high recall, because ranking will refine the final order.


7️⃣ Embeddings and Similarity Search


User Embedding

Represents user interests.

user_id → dense vector

Item Embedding

Represents item semantics.

item_id → dense vector

Retrieval

Use approximate nearest neighbor search:

user_embedding → top similar item embeddings

Examples:

HNSW
IVF
FAISS
ScaNN
Vector DB

👉 Interview Answer

Embeddings are commonly used for recommendation retrieval.

We can represent users and items as dense vectors, then use approximate nearest neighbor search to quickly find items similar to the user’s interests.

This is useful for large-scale candidate generation.


8️⃣ Ranking


Goal

Order candidates by predicted user value.


Common Prediction Targets


Ranking Signals


Ranking Pipeline

Candidates
→ Feature Fetching
→ Lightweight Ranking
→ ML Ranking
→ Re-ranking
→ Final List

👉 Interview Answer

Ranking predicts how valuable each candidate is for the user.

The ranking model may optimize for click probability, watch time, purchase probability, or long-term engagement.

It uses user features, item features, context features, and historical behavior signals.


9️⃣ Re-ranking


Why Re-ranking?

Pure model score may produce bad user experience.

Problems:


Re-ranking Goals


Example

Final feed should not show 10 videos from same creator in a row.

👉 Interview Answer

Re-ranking is applied after model scoring to improve the final user experience.

It enforces diversity, freshness, safety, deduplication, and business constraints.

This prevents the system from simply showing the highest-scoring but repetitive results.


🔟 Feature Store


Why Needed?

Ranking requires many features.

Examples:


Offline Features

Computed from historical data.

Examples:

user_favorite_categories_30d
item_click_rate_7d
creator_quality_score

Online Features

Updated in near real time.

Examples:

recent_clicks
current_session_items
last_viewed_category

Feature Consistency

Important problem:

training-serving skew

This happens when training features and serving features are computed differently.


👉 Interview Answer

A feature store helps provide consistent features for both training and serving.

Offline features are computed from historical data, while online features capture recent user behavior.

Avoiding training-serving skew is important, because inconsistent features can hurt model quality.


1️⃣1️⃣ Online Serving Flow


Flow

User opens app
→ Recommendation request
→ Fetch user profile and context
→ Generate candidates from multiple sources
→ Fetch features
→ Rank candidates
→ Re-rank for diversity and rules
→ Return results
→ Log served recommendations

Latency Budget Example

Total budget: 100ms
Candidate generation: 30ms
Feature fetch: 30ms
Ranking: 30ms
Re-ranking + response: 10ms

👉 Interview Answer

Online serving must be low latency.

I would generate candidates from multiple sources, fetch features from a low-latency feature store, rank candidates using an online model, and then apply re-ranking rules.

The system must log what was served so we can later connect impressions to clicks and conversions.


1️⃣2️⃣ Feedback Loop


User Feedback Events

Positive signals:

Negative signals:


Feedback Pipeline

User interaction
→ Event ingestion
→ Stream processing
→ Feature updates
→ Training data generation
→ Model retraining
→ Model deployment

👉 Interview Answer

Recommendation systems rely on feedback loops.

User interactions are collected as events, processed into features, used for training data, and then fed back into future models.

Both positive and negative feedback are important for improving recommendation quality.


1️⃣3️⃣ Cold Start Problem


New User Cold Start

No behavior history.

Strategies:


New Item Cold Start

No engagement data.

Strategies:


👉 Interview Answer

Cold start is a major challenge.

For new users, I would use onboarding preferences, location, language, and trending content.

For new items, I would use content metadata, embeddings, creator reputation, and controlled exploration traffic.


1️⃣4️⃣ Exploration vs Exploitation


Exploitation

Show items the model already believes user will like.

Pros:

Cons:


Exploration

Show uncertain or new items.

Pros:

Cons:


Strategy

Use:

mostly exploitation + small exploration percentage

Example:

90% ranked items
10% exploration items

👉 Interview Answer

Recommendation systems need to balance exploration and exploitation.

Exploitation maximizes known user preferences, while exploration discovers new interests and collects data for new items.

A small percentage of the feed can be reserved for exploration.


1️⃣5️⃣ A/B Testing and Model Evaluation


Offline Metrics


Online Metrics


A/B Testing Flow

User assigned to experiment
→ Recommendation service uses model variant
→ Log served items and user actions
→ Compare metrics between groups

👉 Interview Answer

Recommendation changes must be tested carefully.

Offline metrics are useful, but online A/B testing is required because user behavior and business metrics may differ from offline predictions.

I would compare engagement, conversion, retention, and negative feedback across experiment groups.


1️⃣6️⃣ Scaling Patterns


Pattern 1: Multi-stage Recommendation

Candidate generation → ranking → re-ranking

Pattern 2: Precompute Candidates

Precompute:


Pattern 3: Cache Hot Features

Cache:


Pattern 4: Separate Online and Offline Systems


Pattern 5: Real-time Feature Updates

Use stream processing for:


👉 Interview Answer

To scale recommendations, I would use a multi-stage pipeline, precompute candidate pools and embeddings, cache hot features, and separate offline training from online serving.

Real-time features can be updated through stream processing to improve freshness.


1️⃣7️⃣ Failure Handling


Common Failures


Strategies


👉 Interview Answer

Recommendation systems should degrade gracefully.

If the ranking model is unavailable, we can fall back to cached recommendations, trending items, or a simpler ranking model.

If one candidate source fails, the system can still use other sources.


1️⃣8️⃣ Consistency Model


Stronger Consistency Needed For


Eventual Consistency Acceptable For


👉 Interview Answer

Most recommendation features can be eventually consistent.

It is acceptable if a user’s latest click updates recommendations a few seconds later.

But privacy settings, blocked content, removed items, and safety rules must be enforced correctly at serving time.


1️⃣9️⃣ Observability


System Metrics


Quality Metrics


👉 Interview Answer

I would monitor both system health and recommendation quality.

System metrics include latency, error rate, feature store performance, and fallback rate.

Quality metrics include CTR, conversion, watch time, diversity, freshness, and negative feedback.


2️⃣0️⃣ End-to-End Flow


Offline Training Flow

User events collected
→ Data pipeline cleans events
→ Generate training examples
→ Compute features
→ Train model
→ Evaluate model
→ Register model
→ Deploy model

Online Serving Flow

User opens app
→ Recommendation service receives request
→ Generate candidates
→ Fetch features
→ Rank candidates
→ Re-rank for diversity and safety
→ Return recommendations
→ Log impression

Feedback Flow

User interacts with item
→ Event logged
→ Stream updates real-time features
→ Batch pipeline updates training data
→ Future recommendations improve

Key Insight

Recommendation System is not just a model — it is a feedback-driven ranking platform.


🧠 Staff-Level Answer (Final)


👉 Interview Answer (Full Version)

When designing a recommendation system, I think of it as a feedback-driven personalized ranking platform.

The system has two main pipelines: an offline pipeline for training and feature generation, and an online pipeline for low-latency serving.

The offline pipeline collects user events, generates training data, computes user and item features, trains models, evaluates them, and deploys model versions.

The online pipeline receives a recommendation request, generates candidates from multiple sources, fetches features, ranks candidates using a model, and then re-ranks results for diversity, freshness, safety, and business rules.

Candidate generation is optimized for recall. It may use collaborative filtering, content-based similarity, embeddings, trending items, social graph signals, and recent user behavior.

Ranking is optimized for precision and predicts objectives such as click probability, watch time, purchase probability, or long-term engagement.

A feature store is important to provide consistent features for both training and serving and to avoid training-serving skew.

The system must handle cold start. For new users, I would use onboarding preferences, location, language, and trending items. For new items, I would use metadata, embeddings, creator reputation, and controlled exploration.

I would continuously evaluate models using offline metrics and online A/B testing, because offline performance does not always translate to better user behavior.

The main trade-offs are relevance, latency, diversity, freshness, exploration, and system cost.

Ultimately, the goal is to recommend the right item to the right user at the right time, while continuously learning from feedback.


⭐ Final Insight

Recommendation System 的核心不是一个模型, 而是一个由 candidate generation、ranking、feedback loop 和 A/B testing 组成的持续学习系统。



中文部分


🎯 Design Recommendation System


1️⃣ 核心框架

在设计 Recommendation System 时,我通常从以下几个方面来分析:

  1. 用户行为收集
  2. Offline feature 和 model pipeline
  3. Candidate generation
  4. Ranking 和 re-ranking
  5. Online serving architecture
  6. Feedback loop 和 experimentation
  7. Cold start 和 freshness
  8. 核心权衡:relevance vs latency vs diversity

2️⃣ 核心需求


功能需求


非功能需求


👉 面试回答

Recommendation System 是一个个性化 ranking system。

它收集用户行为, 生成候选 items, 使用 user 和 item features 对候选结果排序, 并通过 feedback 持续优化。

核心挑战是在 relevance、latency、 diversity、freshness 和 system cost 之间做平衡。


3️⃣ 主要 API


Get Recommendations

GET /api/recommendations?userId=u123&surface=home&limit=20

Response:

{
  "items": [
    {
      "itemId": "i789",
      "score": 0.94,
      "reason": "Because you watched similar videos"
    }
  ]
}

Track User Event

POST /api/events

Request:

{
  "userId": "u123",
  "itemId": "i789",
  "eventType": "click",
  "timestamp": "2026-05-02T10:00:00Z",
  "context": {
    "surface": "home",
    "device": "mobile"
  }
}

Feedback API

POST /api/recommendations/feedback

Request:

{
  "userId": "u123",
  "itemId": "i789",
  "feedback": "not_interested"
}

👉 面试回答

我会提供 recommendation serving API, 并将 event tracking APIs 单独拆开。

Recommendation serving 必须低延迟; 用户事件可以异步处理, 用于 feature update、model training 和 analytics。


4️⃣ 数据模型


User Profile

user_profile (
  user_id VARCHAR PRIMARY KEY,
  age_group VARCHAR,
  region VARCHAR,
  language VARCHAR,
  interests JSON,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
)

Item Table

item (
  item_id VARCHAR PRIMARY KEY,
  item_type VARCHAR,
  title TEXT,
  category VARCHAR,
  tags ARRAY,
  creator_id VARCHAR,
  status VARCHAR,
  created_at TIMESTAMP,
  metadata JSON
)

User Event Table

user_event (
  event_id VARCHAR PRIMARY KEY,
  user_id VARCHAR,
  item_id VARCHAR,
  event_type VARCHAR,
  timestamp TIMESTAMP,
  context JSON
)

Feature Store

feature_store (
  entity_id VARCHAR,
  entity_type VARCHAR,
  feature_name VARCHAR,
  feature_value JSON,
  updated_at TIMESTAMP,
  PRIMARY KEY (entity_id, entity_type, feature_name)
)

Recommendation Log

recommendation_log (
  request_id VARCHAR PRIMARY KEY,
  user_id VARCHAR,
  item_ids ARRAY,
  model_version VARCHAR,
  experiment_id VARCHAR,
  served_at TIMESTAMP
)

👉 面试回答

我会将 user profiles、item metadata、 user interaction events、features 和 recommendation logs 分开存储。

User events 是原始信号。 Feature store 提供 online 和 offline features。 Recommendation logs 对 debugging、training data generation 和 A/B test analysis 非常重要。


5️⃣ High-Level Architecture


User Events
→ Event Ingestion
→ Stream Processing
→ Feature Store
→ Offline Training Pipeline
→ Model Registry

Online Request
→ Recommendation Service
→ Candidate Generation
→ Feature Fetching
→ Ranking Model
→ Re-ranking
→ Response
→ Logging / Feedback Loop

Two Main Pipelines

Offline Pipeline


Online Pipeline


👉 面试回答

我会将 recommendation system 拆成 offline pipeline 和 online pipeline。

Offline pipeline 处理历史行为、训练模型、 生成 embeddings,并计算 features。

Online pipeline 则实时服务请求, 包括 candidate generation、feature fetching、 ranking 和 re-ranking。


6️⃣ Candidate Generation


目标

将数百万 items 缩小到几百或几千个 candidates。

millions of items → 500 candidates

Candidate Sources

Collaborative Filtering

基于相似用户推荐。

users like you also liked X

Content-based Recommendation

推荐和用户喜欢内容相似的 items。

similar category / tags / embeddings

推荐全局或区域热门 items。


Social Graph

推荐朋友 liked 或 shared 的 items。


Recently Viewed / Purchased

根据最近行为推荐相关 items。


Ads / Sponsored Items

可以作为独立 candidate source 插入。


👉 面试回答

Candidate generation 会召回一批可能相关的 items。

我会结合多个 candidate sources, 例如 collaborative filtering、content-based similarity、 trending items、social graph signals 和最近用户行为。

这一阶段目标是高召回, 后续 ranking 会负责精排。


7️⃣ Embeddings and Similarity Search


User Embedding

表示用户兴趣。

user_id → dense vector

Item Embedding

表示 item 语义。

item_id → dense vector

Retrieval

使用 approximate nearest neighbor search:

user_embedding → top similar item embeddings

例如:

HNSW
IVF
FAISS
ScaNN
Vector DB

👉 面试回答

Embeddings 常用于 recommendation retrieval。

我们可以将 users 和 items 表示成 dense vectors, 然后使用 approximate nearest neighbor search 快速找到与用户兴趣相似的 items。

这对于大规模 candidate generation 很有用。


8️⃣ Ranking


目标

按照预测用户价值对 candidates 排序。


常见预测目标


Ranking Signals


Ranking Pipeline

Candidates
→ Feature Fetching
→ Lightweight Ranking
→ ML Ranking
→ Re-ranking
→ Final List

👉 面试回答

Ranking 会预测每个 candidate 对用户的价值。

Ranking model 可以优化 click probability、watch time、 purchase probability 或 long-term engagement。

它会使用 user features、item features、 context features 和历史行为信号。


9️⃣ Re-ranking


为什么需要 Re-ranking?

纯模型分数可能带来不好的用户体验。

问题:


Re-ranking 目标


示例

Final feed should not show 10 videos from same creator in a row.

👉 面试回答

Re-ranking 在模型打分之后执行, 用来改善最终用户体验。

它会强制执行 diversity、freshness、safety、 deduplication 和 business constraints。

这样可以避免系统只展示分数最高但高度重复的结果。


🔟 Feature Store


为什么需要?

Ranking 需要大量 features。

例如:


Offline Features

从历史数据计算。

示例:

user_favorite_categories_30d
item_click_rate_7d
creator_quality_score

Online Features

近实时更新。

示例:

recent_clicks
current_session_items
last_viewed_category

Feature Consistency

重要问题:

training-serving skew

当 training features 和 serving features 计算方式不一致时发生。


👉 面试回答

Feature store 可以为 training 和 serving 提供一致的 features。

Offline features 基于历史数据计算, online features 捕获最近用户行为。

避免 training-serving skew 非常重要, 因为 feature 不一致会伤害模型质量。


1️⃣1️⃣ Online Serving Flow


Flow

User opens app
→ Recommendation request
→ Fetch user profile and context
→ Generate candidates from multiple sources
→ Fetch features
→ Rank candidates
→ Re-rank for diversity and rules
→ Return results
→ Log served recommendations

Latency Budget Example

Total budget: 100ms
Candidate generation: 30ms
Feature fetch: 30ms
Ranking: 30ms
Re-ranking + response: 10ms

👉 面试回答

Online serving 必须低延迟。

我会从多个 sources 生成 candidates, 从低延迟 feature store 获取 features, 使用 online model 排序 candidates, 然后应用 re-ranking rules。

系统必须记录 served recommendations, 这样之后才能将 impressions 和 clicks/conversions 关联起来。


1️⃣2️⃣ Feedback Loop


User Feedback Events

正向信号:

负向信号:


Feedback Pipeline

User interaction
→ Event ingestion
→ Stream processing
→ Feature updates
→ Training data generation
→ Model retraining
→ Model deployment

👉 面试回答

Recommendation system 依赖 feedback loop。

用户行为会作为 events 被收集, 处理成 features, 用于训练数据, 再反馈到未来模型中。

正向和负向反馈都很重要, 都可以帮助提升推荐质量。


1️⃣3️⃣ Cold Start Problem


New User Cold Start

没有行为历史。

策略:


New Item Cold Start

没有 engagement data。

策略:


👉 面试回答

Cold start 是 recommendation system 的主要挑战。

对于新用户, 我会使用 onboarding preferences、location、 language 和 trending content。

对于新 item, 我会使用 content metadata、embeddings、 creator reputation 和 controlled exploration traffic。


1️⃣4️⃣ Exploration vs Exploitation


Exploitation

展示模型已经确定用户会喜欢的 items。

优点:

缺点:


Exploration

展示不确定或新的 items。

优点:

缺点:


Strategy

使用:

mostly exploitation + small exploration percentage

示例:

90% ranked items
10% exploration items

👉 面试回答

Recommendation system 需要平衡 exploration 和 exploitation。

Exploitation 会最大化已知用户偏好; exploration 可以发现新兴趣, 并为新 items 收集数据。

Feed 中可以预留一小部分比例用于 exploration。


1️⃣5️⃣ A/B Testing and Model Evaluation


Offline Metrics


Online Metrics


A/B Testing Flow

User assigned to experiment
→ Recommendation service uses model variant
→ Log served items and user actions
→ Compare metrics between groups

👉 面试回答

Recommendation changes 必须谨慎测试。

Offline metrics 很有用, 但 online A/B testing 是必须的, 因为离线指标好不一定代表真实用户行为变好。

我会比较 engagement、conversion、retention 和 negative feedback 等指标。


1️⃣6️⃣ Scaling Patterns


Pattern 1: Multi-stage Recommendation

Candidate generation → ranking → re-ranking

Pattern 2: Precompute Candidates

预计算:


Pattern 3: Cache Hot Features

缓存:


Pattern 4: Separate Online and Offline Systems


Pattern 5: Real-time Feature Updates

使用 stream processing 更新:


👉 面试回答

为了扩展 recommendations, 我会使用多阶段 pipeline, 预计算 candidate pools 和 embeddings, 缓存热点 features, 并将 offline training 和 online serving 分离。

Real-time features 可以通过 stream processing 更新, 用来提升新鲜度。


1️⃣7️⃣ Failure Handling


Common Failures


Strategies


👉 面试回答

Recommendation system 应该支持优雅降级。

如果 ranking model 不可用, 可以回退到 cached recommendations、 trending items 或简单 ranking model。

如果某个 candidate source 失败, 系统仍然可以使用其他 sources。


1️⃣8️⃣ Consistency Model


需要较强一致性的场景


可以最终一致的场景


👉 面试回答

大多数 recommendation features 可以最终一致。

用户最新点击几秒后影响推荐, 通常是可以接受的。

但是 privacy settings、blocked content、 removed items 和 safety rules 必须在 serving time 正确执行。


1️⃣9️⃣ Observability


System Metrics


Quality Metrics


👉 面试回答

我会同时监控 system health 和 recommendation quality。

System metrics 包括 latency、error rate、 feature store performance 和 fallback rate。

Quality metrics 包括 CTR、conversion、watch time、 diversity、freshness 和 negative feedback。


2️⃣0️⃣ End-to-End Flow


Offline Training Flow

User events collected
→ Data pipeline cleans events
→ Generate training examples
→ Compute features
→ Train model
→ Evaluate model
→ Register model
→ Deploy model

Online Serving Flow

User opens app
→ Recommendation service receives request
→ Generate candidates
→ Fetch features
→ Rank candidates
→ Re-rank for diversity and safety
→ Return recommendations
→ Log impression

Feedback Flow

User interacts with item
→ Event logged
→ Stream updates real-time features
→ Batch pipeline updates training data
→ Future recommendations improve

Key Insight

Recommendation System 不是一个模型, 而是 feedback-driven ranking platform。


🧠 Staff-Level Answer(最终版)


👉 面试回答(完整背诵版)

在设计 Recommendation System 时, 我会把它看作一个 feedback-driven personalized ranking platform。

系统有两个主要 pipeline: 一个是用于训练和 feature generation 的 offline pipeline, 另一个是用于低延迟 serving 的 online pipeline。

Offline pipeline 收集用户事件, 生成训练数据, 计算 user 和 item features, 训练模型, 评估模型, 并部署模型版本。

Online pipeline 接收 recommendation request, 从多个来源生成 candidates, 获取 features, 使用模型对 candidates 排序, 最后根据 diversity、freshness、safety 和 business rules 进行 re-ranking。

Candidate generation 优化的是 recall。 它可以使用 collaborative filtering、 content-based similarity、embeddings、 trending items、social graph signals 和最近用户行为。

Ranking 优化的是 precision, 预测 click probability、watch time、 purchase probability 或 long-term engagement。

Feature store 很重要, 因为它为 training 和 serving 提供一致 features, 并避免 training-serving skew。

系统必须处理 cold start。 对于新用户, 我会使用 onboarding preferences、location、 language 和 trending items。 对于新 items, 我会使用 metadata、embeddings、 creator reputation 和 controlled exploration。

我会持续使用 offline metrics 和 online A/B testing 来评估模型, 因为 offline performance 不一定能转化成真实用户行为提升。

核心权衡包括 relevance、latency、diversity、 freshness、exploration 和 system cost。

最终目标是在合适的时间, 向合适的用户推荐合适的 item, 并通过 feedback 持续学习和优化。


⭐ Final Insight

Recommendation System 的核心不是一个模型, 而是一个由 candidate generation、ranking、feedback loop 和 A/B testing 组成的持续学习系统。

Implement