d&d-t System Design Deep Dive ·

🎯 Design Recommendation System

1️⃣ Core Framework

When discussing Recommendation System design, I frame it as:

User behavior collection
Offline feature and model pipeline
Candidate generation
Ranking and re-ranking
Online serving architecture
Feedback loop and experimentation
Cold start and freshness
Trade-offs: relevance vs latency vs diversity

2️⃣ Core Requirements

Functional Requirements

Recommend items to users
Support personalized recommendations
Support trending / popular recommendations
Support real-time behavior signals
Support multiple surfaces:
- Home feed
- Product recommendations
- Video recommendations
- “You may also like”
Support feedback:
- Click
- View
- Like
- Purchase
- Hide
- Skip
Support A/B testing

Non-functional Requirements

Low-latency serving
High availability
Scalable offline training
Near-real-time feature updates
High relevance
Diversity and freshness
Explainability and monitoring

👉 Interview Answer

A recommendation system is a personalized ranking system.

It collects user behavior, generates candidate items, ranks them using user and item features, and continuously improves through feedback.

The main challenge is balancing relevance, latency, diversity, freshness, and system cost.

3️⃣ Main APIs

Get Recommendations

GET /api/recommendations?userId=u123&surface=home&limit=20

Response:

{
  "items": [
    {
      "itemId": "i789",
      "score": 0.94,
      "reason": "Because you watched similar videos"
    }
  ]
}

Track User Event

POST /api/events

Request:

{
  "userId": "u123",
  "itemId": "i789",
  "eventType": "click",
  "timestamp": "2026-05-02T10:00:00Z",
  "context": {
    "surface": "home",
    "device": "mobile"
  }
}

Feedback API

POST /api/recommendations/feedback

Request:

{
  "userId": "u123",
  "itemId": "i789",
  "feedback": "not_interested"
}

👉 Interview Answer

I would expose a recommendation serving API and separate event tracking APIs.

Recommendation serving must be low latency, while user events can be processed asynchronously for feature updates, model training, and analytics.

4️⃣ Data Model

User Profile

user_profile (
  user_id VARCHAR PRIMARY KEY,
  age_group VARCHAR,
  region VARCHAR,
  language VARCHAR,
  interests JSON,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
)

Item Table

item (
  item_id VARCHAR PRIMARY KEY,
  item_type VARCHAR,
  title TEXT,
  category VARCHAR,
  tags ARRAY,
  creator_id VARCHAR,
  status VARCHAR,
  created_at TIMESTAMP,
  metadata JSON
)

User Event Table

user_event (
  event_id VARCHAR PRIMARY KEY,
  user_id VARCHAR,
  item_id VARCHAR,
  event_type VARCHAR,
  timestamp TIMESTAMP,
  context JSON
)

Feature Store

feature_store (
  entity_id VARCHAR,
  entity_type VARCHAR,
  feature_name VARCHAR,
  feature_value JSON,
  updated_at TIMESTAMP,
  PRIMARY KEY (entity_id, entity_type, feature_name)
)

Recommendation Log

recommendation_log (
  request_id VARCHAR PRIMARY KEY,
  user_id VARCHAR,
  item_ids ARRAY,
  model_version VARCHAR,
  experiment_id VARCHAR,
  served_at TIMESTAMP
)

👉 Interview Answer

I would store user profiles, item metadata, user interaction events, features, and recommendation logs separately.

User events are the raw signal. Feature store provides online and offline features. Recommendation logs are important for debugging, training data generation, and A/B test analysis.

5️⃣ High-Level Architecture

User Events
→ Event Ingestion
→ Stream Processing
→ Feature Store
→ Offline Training Pipeline
→ Model Registry

Online Request
→ Recommendation Service
→ Candidate Generation
→ Feature Fetching
→ Ranking Model
→ Re-ranking
→ Response
→ Logging / Feedback Loop

Two Main Pipelines

Offline Pipeline

Process historical data
Build embeddings
Train ranking models
Generate candidate indexes
Update feature store

Online Pipeline

Receive recommendation request
Fetch user context
Generate candidates
Rank candidates
Apply business rules
Return recommendations

👉 Interview Answer

I would separate recommendation into offline and online pipelines.

Offline pipelines process historical behavior, train models, generate embeddings, and compute features.

Online pipelines serve recommendations in real time using candidate generation, feature fetching, ranking, and re-ranking.

6️⃣ Candidate Generation

Goal

Reduce millions of items to a few hundred or thousand candidates.

millions of items → 500 candidates

Candidate Sources

Collaborative Filtering

Recommend items based on similar users.

users like you also liked X

Content-based Recommendation

Recommend items similar to what user liked.

similar category / tags / embeddings

Recommend globally or regionally popular items.

Recommend items liked or shared by friends.

Ads / Sponsored Items

May be inserted as separate candidate source.

👉 Interview Answer

Candidate generation retrieves a broad set of potentially relevant items.

I would combine multiple candidate sources, such as collaborative filtering, content-based similarity, trending items, social graph signals, and recent user behavior.

The goal is high recall, because ranking will refine the final order.

7️⃣ Embeddings and Similarity Search

User Embedding

Represents user interests.

user_id → dense vector

Item Embedding

Represents item semantics.

item_id → dense vector

Retrieval

Use approximate nearest neighbor search:

user_embedding → top similar item embeddings

Examples:

HNSW
IVF
FAISS
ScaNN
Vector DB

👉 Interview Answer

Embeddings are commonly used for recommendation retrieval.

We can represent users and items as dense vectors, then use approximate nearest neighbor search to quickly find items similar to the user’s interests.

This is useful for large-scale candidate generation.

8️⃣ Ranking

Goal

Order candidates by predicted user value.

Common Prediction Targets

Click probability
Watch time
Purchase probability
Conversion probability
Long-term engagement
Retention impact

Ranking Signals

User interests
User recent behavior
Item category
Item popularity
Freshness
Creator quality
Context:
- Time of day
- Device
- Location
Negative feedback
Business constraints

Ranking Pipeline

Candidates
→ Feature Fetching
→ Lightweight Ranking
→ ML Ranking
→ Re-ranking
→ Final List

👉 Interview Answer

Ranking predicts how valuable each candidate is for the user.

The ranking model may optimize for click probability, watch time, purchase probability, or long-term engagement.

It uses user features, item features, context features, and historical behavior signals.

9️⃣ Re-ranking

Why Re-ranking?

Pure model score may produce bad user experience.

Problems:

Too many similar items
Too many items from same creator
Too many old items
Unsafe content
Over-optimization for clicks
Filter bubble

Re-ranking Goals

Diversity
Freshness
Safety
Business rules
Creator fairness
Exploration
Ad insertion
Deduplication

Example

Final feed should not show 10 videos from same creator in a row.

👉 Interview Answer

Re-ranking is applied after model scoring to improve the final user experience.

It enforces diversity, freshness, safety, deduplication, and business constraints.

This prevents the system from simply showing the highest-scoring but repetitive results.

🔟 Feature Store

Why Needed?

Ranking requires many features.

Examples:

User features
Item features
User-item interaction features
Real-time session features

Offline Features

Computed from historical data.

Examples:

user_favorite_categories_30d
item_click_rate_7d
creator_quality_score

Online Features

Updated in near real time.

Examples:

recent_clicks
current_session_items
last_viewed_category

Feature Consistency

Important problem:

training-serving skew

This happens when training features and serving features are computed differently.

👉 Interview Answer

A feature store helps provide consistent features for both training and serving.

Offline features are computed from historical data, while online features capture recent user behavior.

Avoiding training-serving skew is important, because inconsistent features can hurt model quality.

1️⃣1️⃣ Online Serving Flow

Flow

User opens app
→ Recommendation request
→ Fetch user profile and context
→ Generate candidates from multiple sources
→ Fetch features
→ Rank candidates
→ Re-rank for diversity and rules
→ Return results
→ Log served recommendations

Latency Budget Example

Total budget: 100ms
Candidate generation: 30ms
Feature fetch: 30ms
Ranking: 30ms
Re-ranking + response: 10ms

👉 Interview Answer

Online serving must be low latency.

I would generate candidates from multiple sources, fetch features from a low-latency feature store, rank candidates using an online model, and then apply re-ranking rules.

The system must log what was served so we can later connect impressions to clicks and conversions.

1️⃣2️⃣ Feedback Loop

User Feedback Events

Positive signals:

Click
Like
Share
Purchase
Long watch time

Negative signals:

Hide
Skip
Short watch time
Report
Not interested

Feedback Pipeline

User interaction
→ Event ingestion
→ Stream processing
→ Feature updates
→ Training data generation
→ Model retraining
→ Model deployment

👉 Interview Answer

Recommendation systems rely on feedback loops.

User interactions are collected as events, processed into features, used for training data, and then fed back into future models.

Both positive and negative feedback are important for improving recommendation quality.

1️⃣3️⃣ Cold Start Problem

New User Cold Start

No behavior history.

Strategies:

Ask onboarding preferences
Use location / language / device
Use trending items
Use demographic-level recommendations
Explore diverse content

New Item Cold Start

No engagement data.

Strategies:

Use content metadata
Use item embeddings
Boost new content temporarily
Explore with small traffic
Use creator reputation

👉 Interview Answer

Cold start is a major challenge.

For new users, I would use onboarding preferences, location, language, and trending content.

For new items, I would use content metadata, embeddings, creator reputation, and controlled exploration traffic.

1️⃣4️⃣ Exploration vs Exploitation

Exploitation

Show items the model already believes user will like.

Pros:

Higher short-term engagement

Cons:

Less discovery
Filter bubble

Exploration

Show uncertain or new items.

Pros:

Discover new interests
Collect training data
Help new items

Cons:

May reduce short-term engagement

Strategy

Use:

mostly exploitation + small exploration percentage

Example:

90% ranked items
10% exploration items

👉 Interview Answer

Recommendation systems need to balance exploration and exploitation.

Exploitation maximizes known user preferences, while exploration discovers new interests and collects data for new items.

A small percentage of the feed can be reserved for exploration.

1️⃣5️⃣ A/B Testing and Model Evaluation

Offline Metrics

AUC
Precision@K
Recall@K
NDCG
MAP
Loss

Online Metrics

CTR
Watch time
Conversion rate
Revenue
Retention
Session length
Hide / report rate

A/B Testing Flow

User assigned to experiment
→ Recommendation service uses model variant
→ Log served items and user actions
→ Compare metrics between groups

👉 Interview Answer

Recommendation changes must be tested carefully.

Offline metrics are useful, but online A/B testing is required because user behavior and business metrics may differ from offline predictions.

I would compare engagement, conversion, retention, and negative feedback across experiment groups.

1️⃣6️⃣ Scaling Patterns

Pattern 1: Multi-stage Recommendation

Candidate generation → ranking → re-ranking

Pattern 2: Precompute Candidates

Precompute:

Similar items
User embeddings
Item embeddings
Popular items
User candidate pools

Pattern 3: Cache Hot Features

Cache:

User features
Item features
Popular item lists
Embeddings

Pattern 4: Separate Online and Offline Systems

Offline = batch training and feature generation
Online = low-latency serving

Pattern 5: Real-time Feature Updates

Use stream processing for:

Recent clicks
Recent views
Session behavior
Trending items

👉 Interview Answer

To scale recommendations, I would use a multi-stage pipeline, precompute candidate pools and embeddings, cache hot features, and separate offline training from online serving.

Real-time features can be updated through stream processing to improve freshness.

1️⃣7️⃣ Failure Handling

Common Failures

Candidate service timeout
Feature store unavailable
Ranking model timeout
Model deployment issue
Event ingestion lag
Bad recommendations
Cold start fallback needed

Strategies

Fallback to popular items
Use cached recommendations
Skip failed candidate source
Use simpler ranking model
Circuit breaker around model service
Roll back model version
Monitor negative feedback

👉 Interview Answer

Recommendation systems should degrade gracefully.

If the ranking model is unavailable, we can fall back to cached recommendations, trending items, or a simpler ranking model.

If one candidate source fails, the system can still use other sources.

1️⃣8️⃣ Consistency Model

Stronger Consistency Needed For

User privacy settings
Blocked content
Removed items
Safety policy decisions
Experiment assignment

Eventual Consistency Acceptable For

User interest updates
Ranking features
Popularity counters
Embeddings
Training data
Analytics

👉 Interview Answer

Most recommendation features can be eventually consistent.

It is acceptable if a user’s latest click updates recommendations a few seconds later.

But privacy settings, blocked content, removed items, and safety rules must be enforced correctly at serving time.

1️⃣9️⃣ Observability

System Metrics

Recommendation API latency
Candidate generation latency
Feature store latency
Ranking latency
Error rate
Cache hit rate
Fallback rate
Model timeout rate

Quality Metrics

CTR
Conversion rate
Watch time
Retention
Diversity
Freshness
Negative feedback rate
Coverage
Novelty

👉 Interview Answer

I would monitor both system health and recommendation quality.

System metrics include latency, error rate, feature store performance, and fallback rate.

Quality metrics include CTR, conversion, watch time, diversity, freshness, and negative feedback.

2️⃣0️⃣ End-to-End Flow

Offline Training Flow

User events collected
→ Data pipeline cleans events
→ Generate training examples
→ Compute features
→ Train model
→ Evaluate model
→ Register model
→ Deploy model

Online Serving Flow

User opens app
→ Recommendation service receives request
→ Generate candidates
→ Fetch features
→ Rank candidates
→ Re-rank for diversity and safety
→ Return recommendations
→ Log impression

Feedback Flow

User interacts with item
→ Event logged
→ Stream updates real-time features
→ Batch pipeline updates training data
→ Future recommendations improve

Key Insight

Recommendation System is not just a model — it is a feedback-driven ranking platform.

🧠 Staff-Level Answer (Final)

👉 Interview Answer (Full Version)

When designing a recommendation system, I think of it as a feedback-driven personalized ranking platform.

The system has two main pipelines: an offline pipeline for training and feature generation, and an online pipeline for low-latency serving.

The offline pipeline collects user events, generates training data, computes user and item features, trains models, evaluates them, and deploys model versions.

The online pipeline receives a recommendation request, generates candidates from multiple sources, fetches features, ranks candidates using a model, and then re-ranks results for diversity, freshness, safety, and business rules.

Candidate generation is optimized for recall. It may use collaborative filtering, content-based similarity, embeddings, trending items, social graph signals, and recent user behavior.

Ranking is optimized for precision and predicts objectives such as click probability, watch time, purchase probability, or long-term engagement.

A feature store is important to provide consistent features for both training and serving and to avoid training-serving skew.

The system must handle cold start. For new users, I would use onboarding preferences, location, language, and trending items. For new items, I would use metadata, embeddings, creator reputation, and controlled exploration.

I would continuously evaluate models using offline metrics and online A/B testing, because offline performance does not always translate to better user behavior.

The main trade-offs are relevance, latency, diversity, freshness, exploration, and system cost.

Ultimately, the goal is to recommend the right item to the right user at the right time, while continuously learning from feedback.

⭐ Final Insight

Recommendation System 的核心不是一个模型，而是一个由 candidate generation、ranking、feedback loop 和 A/B testing 组成的持续学习系统。

中文部分

🎯 Design Recommendation System

1️⃣ 核心框架

在设计 Recommendation System 时，我通常从以下几个方面来分析：

用户行为收集
Offline feature 和 model pipeline
Candidate generation
Ranking 和 re-ranking
Online serving architecture
Feedback loop 和 experimentation
Cold start 和 freshness
核心权衡：relevance vs latency vs diversity

2️⃣ 核心需求

功能需求

向用户推荐 items
支持个性化推荐
支持 trending / popular recommendations
支持实时行为信号
支持多个推荐场景：
- Home feed
- Product recommendations
- Video recommendations
- “You may also like”
支持用户反馈：
- Click
- View
- Like
- Purchase
- Hide
- Skip
支持 A/B testing

非功能需求

低延迟 serving
高可用
可扩展 offline training
近实时 feature updates
高相关性
多样性和新鲜度
可解释性和监控

👉 面试回答

Recommendation System 是一个个性化 ranking system。

它收集用户行为，生成候选 items，使用 user 和 item features 对候选结果排序，并通过 feedback 持续优化。

核心挑战是在 relevance、latency、 diversity、freshness 和 system cost 之间做平衡。

3️⃣ 主要 API

Get Recommendations

GET /api/recommendations?userId=u123&surface=home&limit=20

Response:

{
  "items": [
    {
      "itemId": "i789",
      "score": 0.94,
      "reason": "Because you watched similar videos"
    }
  ]
}

Track User Event

POST /api/events

Request:

{
  "userId": "u123",
  "itemId": "i789",
  "eventType": "click",
  "timestamp": "2026-05-02T10:00:00Z",
  "context": {
    "surface": "home",
    "device": "mobile"
  }
}

Feedback API

POST /api/recommendations/feedback

Request:

{
  "userId": "u123",
  "itemId": "i789",
  "feedback": "not_interested"
}

👉 面试回答

我会提供 recommendation serving API，并将 event tracking APIs 单独拆开。

Recommendation serving 必须低延迟；用户事件可以异步处理，用于 feature update、model training 和 analytics。

4️⃣ 数据模型

User Profile

user_profile (
  user_id VARCHAR PRIMARY KEY,
  age_group VARCHAR,
  region VARCHAR,
  language VARCHAR,
  interests JSON,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
)

Item Table

item (
  item_id VARCHAR PRIMARY KEY,
  item_type VARCHAR,
  title TEXT,
  category VARCHAR,
  tags ARRAY,
  creator_id VARCHAR,
  status VARCHAR,
  created_at TIMESTAMP,
  metadata JSON
)

User Event Table

user_event (
  event_id VARCHAR PRIMARY KEY,
  user_id VARCHAR,
  item_id VARCHAR,
  event_type VARCHAR,
  timestamp TIMESTAMP,
  context JSON
)

Feature Store

feature_store (
  entity_id VARCHAR,
  entity_type VARCHAR,
  feature_name VARCHAR,
  feature_value JSON,
  updated_at TIMESTAMP,
  PRIMARY KEY (entity_id, entity_type, feature_name)
)

Recommendation Log

recommendation_log (
  request_id VARCHAR PRIMARY KEY,
  user_id VARCHAR,
  item_ids ARRAY,
  model_version VARCHAR,
  experiment_id VARCHAR,
  served_at TIMESTAMP
)

👉 面试回答

我会将 user profiles、item metadata、 user interaction events、features 和 recommendation logs 分开存储。

User events 是原始信号。 Feature store 提供 online 和 offline features。 Recommendation logs 对 debugging、training data generation 和 A/B test analysis 非常重要。

5️⃣ High-Level Architecture

User Events
→ Event Ingestion
→ Stream Processing
→ Feature Store
→ Offline Training Pipeline
→ Model Registry

Online Request
→ Recommendation Service
→ Candidate Generation
→ Feature Fetching
→ Ranking Model
→ Re-ranking
→ Response
→ Logging / Feedback Loop

Two Main Pipelines

Offline Pipeline

处理历史数据
构建 embeddings
训练 ranking models
生成 candidate indexes
更新 feature store

Online Pipeline

接收 recommendation request
获取 user context
生成 candidates
对 candidates 排序
应用 business rules
返回 recommendations

👉 面试回答

我会将 recommendation system 拆成 offline pipeline 和 online pipeline。

Offline pipeline 处理历史行为、训练模型、生成 embeddings，并计算 features。

Online pipeline 则实时服务请求，包括 candidate generation、feature fetching、 ranking 和 re-ranking。

6️⃣ Candidate Generation

目标

将数百万 items 缩小到几百或几千个 candidates。

millions of items → 500 candidates

Candidate Sources

Collaborative Filtering

基于相似用户推荐。

users like you also liked X

Content-based Recommendation

推荐和用户喜欢内容相似的 items。

similar category / tags / embeddings

推荐全局或区域热门 items。

推荐朋友 liked 或 shared 的 items。

Ads / Sponsored Items

可以作为独立 candidate source 插入。

👉 面试回答

Candidate generation 会召回一批可能相关的 items。

我会结合多个 candidate sources，例如 collaborative filtering、content-based similarity、 trending items、social graph signals 和最近用户行为。

这一阶段目标是高召回，后续 ranking 会负责精排。

7️⃣ Embeddings and Similarity Search

User Embedding

表示用户兴趣。

user_id → dense vector

Item Embedding

表示 item 语义。

item_id → dense vector

Retrieval

使用 approximate nearest neighbor search：

user_embedding → top similar item embeddings

例如：

HNSW
IVF
FAISS
ScaNN
Vector DB

👉 面试回答

Embeddings 常用于 recommendation retrieval。

我们可以将 users 和 items 表示成 dense vectors，然后使用 approximate nearest neighbor search 快速找到与用户兴趣相似的 items。

这对于大规模 candidate generation 很有用。

8️⃣ Ranking

目标

按照预测用户价值对 candidates 排序。

常见预测目标

Click probability
Watch time
Purchase probability
Conversion probability
Long-term engagement
Retention impact

Ranking Signals

User interests
User recent behavior
Item category
Item popularity
Freshness
Creator quality
Context:
- Time of day
- Device
- Location
Negative feedback
Business constraints

Ranking Pipeline

Candidates
→ Feature Fetching
→ Lightweight Ranking
→ ML Ranking
→ Re-ranking
→ Final List

👉 面试回答

Ranking 会预测每个 candidate 对用户的价值。

Ranking model 可以优化 click probability、watch time、 purchase probability 或 long-term engagement。

它会使用 user features、item features、 context features 和历史行为信号。

9️⃣ Re-ranking

为什么需要 Re-ranking？

纯模型分数可能带来不好的用户体验。

问题：

太多相似 items
太多来自同一个 creator 的内容
太多旧内容
Unsafe content
过度优化 clicks
Filter bubble

Re-ranking 目标

Diversity
Freshness
Safety
Business rules
Creator fairness
Exploration
Ad insertion
Deduplication

示例

Final feed should not show 10 videos from same creator in a row.

👉 面试回答

Re-ranking 在模型打分之后执行，用来改善最终用户体验。

它会强制执行 diversity、freshness、safety、 deduplication 和 business constraints。

这样可以避免系统只展示分数最高但高度重复的结果。

🔟 Feature Store

为什么需要？

Ranking 需要大量 features。

例如：

User features
Item features
User-item interaction features
Real-time session features

Offline Features

从历史数据计算。

示例：

user_favorite_categories_30d
item_click_rate_7d
creator_quality_score

Online Features

近实时更新。

示例：

recent_clicks
current_session_items
last_viewed_category

Feature Consistency

重要问题：

training-serving skew

当 training features 和 serving features 计算方式不一致时发生。

👉 面试回答

Feature store 可以为 training 和 serving 提供一致的 features。

Offline features 基于历史数据计算， online features 捕获最近用户行为。

避免 training-serving skew 非常重要，因为 feature 不一致会伤害模型质量。

1️⃣1️⃣ Online Serving Flow

Flow

User opens app
→ Recommendation request
→ Fetch user profile and context
→ Generate candidates from multiple sources
→ Fetch features
→ Rank candidates
→ Re-rank for diversity and rules
→ Return results
→ Log served recommendations

Latency Budget Example

Total budget: 100ms
Candidate generation: 30ms
Feature fetch: 30ms
Ranking: 30ms
Re-ranking + response: 10ms

👉 面试回答

Online serving 必须低延迟。

我会从多个 sources 生成 candidates，从低延迟 feature store 获取 features，使用 online model 排序 candidates，然后应用 re-ranking rules。

系统必须记录 served recommendations，这样之后才能将 impressions 和 clicks/conversions 关联起来。

1️⃣2️⃣ Feedback Loop

User Feedback Events

正向信号：

Click
Like
Share
Purchase
Long watch time

负向信号：

Hide
Skip
Short watch time
Report
Not interested

Feedback Pipeline

User interaction
→ Event ingestion
→ Stream processing
→ Feature updates
→ Training data generation
→ Model retraining
→ Model deployment

👉 面试回答

Recommendation system 依赖 feedback loop。

用户行为会作为 events 被收集，处理成 features，用于训练数据，再反馈到未来模型中。

正向和负向反馈都很重要，都可以帮助提升推荐质量。

1️⃣3️⃣ Cold Start Problem

New User Cold Start

没有行为历史。

策略：

Ask onboarding preferences
使用 location / language / device
使用 trending items
使用 demographic-level recommendations
探索多样内容

New Item Cold Start

没有 engagement data。

策略：

使用 content metadata
使用 item embeddings
临时 boost new content
小流量探索
使用 creator reputation

👉 面试回答

Cold start 是 recommendation system 的主要挑战。

对于新用户，我会使用 onboarding preferences、location、 language 和 trending content。

对于新 item，我会使用 content metadata、embeddings、 creator reputation 和 controlled exploration traffic。

1️⃣4️⃣ Exploration vs Exploitation

Exploitation

展示模型已经确定用户会喜欢的 items。

优点：

短期 engagement 更高

缺点：

发现性较差
容易形成 filter bubble

Exploration

展示不确定或新的 items。

优点：

发现新兴趣
收集训练数据
帮助新 items

缺点：

可能降低短期 engagement

Strategy

使用：

mostly exploitation + small exploration percentage

示例：

90% ranked items
10% exploration items

👉 面试回答

Recommendation system 需要平衡 exploration 和 exploitation。

Exploitation 会最大化已知用户偏好； exploration 可以发现新兴趣，并为新 items 收集数据。

Feed 中可以预留一小部分比例用于 exploration。

1️⃣5️⃣ A/B Testing and Model Evaluation

Offline Metrics

AUC
Precision@K
Recall@K
NDCG
MAP
Loss

Online Metrics

CTR
Watch time
Conversion rate
Revenue
Retention
Session length
Hide / report rate

A/B Testing Flow

User assigned to experiment
→ Recommendation service uses model variant
→ Log served items and user actions
→ Compare metrics between groups

👉 面试回答

Recommendation changes 必须谨慎测试。

Offline metrics 很有用，但 online A/B testing 是必须的，因为离线指标好不一定代表真实用户行为变好。

我会比较 engagement、conversion、retention 和 negative feedback 等指标。

1️⃣6️⃣ Scaling Patterns

Pattern 1: Multi-stage Recommendation

Candidate generation → ranking → re-ranking

Pattern 2: Precompute Candidates

预计算：

Similar items
User embeddings
Item embeddings
Popular items
User candidate pools

Pattern 3: Cache Hot Features

缓存：

User features
Item features
Popular item lists
Embeddings

Pattern 4: Separate Online and Offline Systems

Offline = batch training and feature generation
Online = low-latency serving

Pattern 5: Real-time Feature Updates

使用 stream processing 更新：

Recent clicks
Recent views
Session behavior
Trending items

👉 面试回答

为了扩展 recommendations，我会使用多阶段 pipeline，预计算 candidate pools 和 embeddings，缓存热点 features，并将 offline training 和 online serving 分离。

Real-time features 可以通过 stream processing 更新，用来提升新鲜度。

1️⃣7️⃣ Failure Handling

Common Failures

Candidate service timeout
Feature store unavailable
Ranking model timeout
Model deployment issue
Event ingestion lag
Bad recommendations
Cold start fallback needed

Strategies

Fallback to popular items
Use cached recommendations
Skip failed candidate source
Use simpler ranking model
Circuit breaker around model service
Roll back model version
Monitor negative feedback

👉 面试回答

Recommendation system 应该支持优雅降级。

如果 ranking model 不可用，可以回退到 cached recommendations、 trending items 或简单 ranking model。

如果某个 candidate source 失败，系统仍然可以使用其他 sources。

1️⃣8️⃣ Consistency Model

需要较强一致性的场景

User privacy settings
Blocked content
Removed items
Safety policy decisions
Experiment assignment

可以最终一致的场景

User interest updates
Ranking features
Popularity counters
Embeddings
Training data
Analytics

👉 面试回答

大多数 recommendation features 可以最终一致。

用户最新点击几秒后影响推荐，通常是可以接受的。

但是 privacy settings、blocked content、 removed items 和 safety rules 必须在 serving time 正确执行。

1️⃣9️⃣ Observability

System Metrics

Recommendation API latency
Candidate generation latency
Feature store latency
Ranking latency
Error rate
Cache hit rate
Fallback rate
Model timeout rate

Quality Metrics

CTR
Conversion rate
Watch time
Retention
Diversity
Freshness
Negative feedback rate
Coverage
Novelty

👉 面试回答

我会同时监控 system health 和 recommendation quality。

System metrics 包括 latency、error rate、 feature store performance 和 fallback rate。

Quality metrics 包括 CTR、conversion、watch time、 diversity、freshness 和 negative feedback。

2️⃣0️⃣ End-to-End Flow

Offline Training Flow

User events collected
→ Data pipeline cleans events
→ Generate training examples
→ Compute features
→ Train model
→ Evaluate model
→ Register model
→ Deploy model

Online Serving Flow

User opens app
→ Recommendation service receives request
→ Generate candidates
→ Fetch features
→ Rank candidates
→ Re-rank for diversity and safety
→ Return recommendations
→ Log impression

Feedback Flow

User interacts with item
→ Event logged
→ Stream updates real-time features
→ Batch pipeline updates training data
→ Future recommendations improve

Key Insight

Recommendation System 不是一个模型，而是 feedback-driven ranking platform。

🧠 Staff-Level Answer（最终版）

👉 面试回答（完整背诵版）

在设计 Recommendation System 时，我会把它看作一个 feedback-driven personalized ranking platform。

系统有两个主要 pipeline：一个是用于训练和 feature generation 的 offline pipeline，另一个是用于低延迟 serving 的 online pipeline。

Offline pipeline 收集用户事件，生成训练数据，计算 user 和 item features，训练模型，评估模型，并部署模型版本。

Online pipeline 接收 recommendation request，从多个来源生成 candidates，获取 features，使用模型对 candidates 排序，最后根据 diversity、freshness、safety 和 business rules 进行 re-ranking。

Candidate generation 优化的是 recall。它可以使用 collaborative filtering、 content-based similarity、embeddings、 trending items、social graph signals 和最近用户行为。

Ranking 优化的是 precision，预测 click probability、watch time、 purchase probability 或 long-term engagement。

Feature store 很重要，因为它为 training 和 serving 提供一致 features，并避免 training-serving skew。

系统必须处理 cold start。对于新用户，我会使用 onboarding preferences、location、 language 和 trending items。对于新 items，我会使用 metadata、embeddings、 creator reputation 和 controlled exploration。

我会持续使用 offline metrics 和 online A/B testing 来评估模型，因为 offline performance 不一定能转化成真实用户行为提升。

核心权衡包括 relevance、latency、diversity、 freshness、exploration 和 system cost。

最终目标是在合适的时间，向合适的用户推荐合适的 item，并通过 feedback 持续学习和优化。

⭐ Final Insight

Recommendation System 的核心不是一个模型，而是一个由 candidate generation、ranking、feedback loop 和 A/B testing 组成的持续学习系统。

🎯 Design Recommendation System

1️⃣ Core Framework

2️⃣ Core Requirements

Functional Requirements

Non-functional Requirements

3️⃣ Main APIs

Get Recommendations

Track User Event

Feedback API

4️⃣ Data Model

User Profile

Item Table

User Event Table

Feature Store

Recommendation Log

5️⃣ High-Level Architecture

Two Main Pipelines

Offline Pipeline

Online Pipeline

6️⃣ Candidate Generation

Goal

Candidate Sources

Collaborative Filtering

Content-based Recommendation

Trending / Popular

Social Graph

Recently Viewed / Purchased

Ads / Sponsored Items

7️⃣ Embeddings and Similarity Search

User Embedding

Item Embedding

Retrieval

8️⃣ Ranking

Goal

Common Prediction Targets

Ranking Signals

Ranking Pipeline

9️⃣ Re-ranking

Why Re-ranking?

Re-ranking Goals

Example

🔟 Feature Store

Why Needed?

Offline Features

Online Features

Feature Consistency

1️⃣1️⃣ Online Serving Flow

Flow

Latency Budget Example

1️⃣2️⃣ Feedback Loop

User Feedback Events

Feedback Pipeline

1️⃣3️⃣ Cold Start Problem

New User Cold Start

New Item Cold Start

1️⃣4️⃣ Exploration vs Exploitation

Exploitation

Exploration

Strategy

1️⃣5️⃣ A/B Testing and Model Evaluation

Offline Metrics

Online Metrics

A/B Testing Flow

1️⃣6️⃣ Scaling Patterns

Pattern 1: Multi-stage Recommendation

Pattern 2: Precompute Candidates

Pattern 3: Cache Hot Features

Pattern 4: Separate Online and Offline Systems

Pattern 5: Real-time Feature Updates

1️⃣7️⃣ Failure Handling

Common Failures

Strategies

1️⃣8️⃣ Consistency Model

Stronger Consistency Needed For

Eventual Consistency Acceptable For

1️⃣9️⃣ Observability

System Metrics

Quality Metrics

2️⃣0️⃣ End-to-End Flow

Offline Training Flow