aaa-psd AI Product & System Design ·

🎯 Design an AI Recommendation System with LLMs

1️⃣ Core Framework

When designing an AI Recommendation System with LLMs, I frame it as:

Product requirements
User and item modeling
Candidate generation
Ranking and personalization
LLM-based reasoning layer
Feedback loop
Safety and fairness
Trade-offs: relevance vs latency vs cost

2️⃣ Product Goal

An AI recommendation system suggests relevant items to users.

Examples:

Products
Videos
Articles
Courses
Jobs
Restaurants
Games
Financial products
Internal documents

Basic Flow

User Context
→ Candidate Retrieval
→ Ranking
→ LLM Reasoning / Explanation
→ Personalized Recommendations
→ Feedback Loop

👉 Interview Answer

An AI recommendation system uses user behavior, item metadata, embeddings, ranking models, and sometimes LLM reasoning to recommend relevant items.

The LLM is usually not the whole recommender.

It is often used for understanding intent, enriching metadata, explaining recommendations, or re-ranking small candidate sets.

3️⃣ Functional Requirements

Core Features

The system should support:

Personalized recommendations
Similar item recommendations
Search-based recommendations
Trending recommendations
Cold-start recommendations
Explanation for recommendations
User feedback
Filtering and constraints

Examples

Recommend products similar to this item.

Recommend videos based on watch history.

Recommend jobs based on resume and preferences.

Recommend documents based on current task.

👉 Interview Answer

Core requirements include personalized recommendations, similar-item recommendations, cold-start handling, ranking, feedback collection, filtering, and recommendation explanations.

4️⃣ Non-functional Requirements

Important System Qualities

The system should optimize for:

Relevance
Low latency
Scalability
Freshness
Diversity
Fairness
Explainability
Cost efficiency
Privacy

Key Trade-off

More personalization and LLM reasoning
→ Better quality

But also
→ Higher latency and cost

👉 Interview Answer

Non-functional requirements include relevance, latency, scalability, freshness, diversity, fairness, explainability, privacy, and cost efficiency.

The core trade-off is recommendation quality versus latency and cost.

5️⃣ High-Level Architecture

Architecture

Client
→ Recommendation API
→ User Profile Service
→ Candidate Generation
→ Feature Store
→ Ranking Service
→ LLM Reasoning / Explanation Service
→ Business Rules / Safety Filters
→ Response
→ Feedback Pipeline

Core Components

User Profile Service

Stores user preferences and behavior.

Candidate Generation

Finds possible recommendation items.

Ranking Service

Scores and orders candidates.

LLM Layer

Explains, re-ranks, or personalizes recommendations.

Feedback Pipeline

Collects clicks, views, purchases, likes, skips, and conversions.

👉 Interview Answer

A production recommendation system usually includes user profiles, item catalogs, candidate generation, feature stores, ranking models, business filters, an optional LLM reasoning layer, and a feedback pipeline.

6️⃣ Data Model

Main Entities

User
Item
UserEvent
RecommendationRequest
CandidateItem
RankedItem
FeedbackEvent

User Profile Example

{
  "user_id": "user_123",
  "interests": ["machine learning", "system design"],
  "recent_views": ["item_1", "item_2"],
  "purchased_items": ["item_3"],
  "negative_feedback": ["item_9"]
}

Item Metadata Example

{
  "item_id": "item_456",
  "title": "Distributed Systems Course",
  "category": "education",
  "tags": ["backend", "scalability", "architecture"],
  "embedding": [0.12, -0.44, 0.88]
}

👉 Interview Answer

I would model users, items, user events, candidate items, ranked items, and feedback events.

User profiles capture preferences and behavior, while item metadata captures categories, tags, text, images, embeddings, and business attributes.

7️⃣ User Modeling

What User Profile Includes

A user profile may include:

Explicit preferences
Click history
View history
Purchase history
Search history
Likes and dislikes
Recent session behavior
Long-term interests
Negative feedback

Short-term vs Long-term Preference

Short-term:
What user is doing now.

Long-term:
What user generally likes.

👉 Interview Answer

User modeling combines long-term preferences and short-term session intent.

Long-term profile captures stable interests, while short-term behavior captures what the user wants right now.

8️⃣ Item Modeling

Item Features

Items can be represented using:

Title
Description
Category
Tags
Price
Popularity
Freshness
Ratings
Text embeddings
Image embeddings
Availability
Business constraints

LLM Role in Item Modeling

LLMs can help generate:

Tags
Summaries
Categories
Attribute extraction
Natural-language descriptions
Similarity explanations

👉 Interview Answer

Item modeling combines structured metadata, behavioral signals, embeddings, and generated attributes.

LLMs are useful for enriching item metadata, extracting attributes, summarizing content, and improving semantic matching.

9️⃣ Candidate Generation

Why Candidate Generation Matters

The system may have millions or billions of items.

It cannot rank everything deeply.

Candidate Sources

Collaborative filtering
Content-based retrieval
Vector search
Trending items
Popular items
Recently viewed similar items
Search query matches
Business campaigns
Graph-based recommendations

Flow

User Profile
→ Retrieve 1,000 candidates
→ Pass to ranking stage

👉 Interview Answer

Candidate generation retrieves a smaller set of potentially relevant items from a large catalog.

It uses collaborative filtering, content-based retrieval, vector search, popularity, trending signals, and business rules.

🔟 Ranking

Ranking Goal

Ranking orders candidates by expected user value.

Ranking Signals

User-item similarity
Click probability
Purchase probability
Watch time
Rating prediction
Freshness
Diversity
Price sensitivity
User intent
Business priority

Ranking Flow

Candidates
→ Feature Enrichment
→ Ranking Model
→ Ranked List
→ Filters
→ Final Recommendations

👉 Interview Answer

Ranking scores candidate items using user features, item features, behavioral signals, contextual signals, and business rules.

The ranker optimizes for predicted relevance or conversion.

1️⃣1️⃣ Where LLMs Fit

LLMs Are Not Usually the First-stage Ranker

LLMs are expensive.

They are better used for:

Query understanding
User intent interpretation
Item metadata enrichment
Explanation generation
Re-ranking top candidates
Cold-start reasoning
Conversational recommendations

Common Pattern

Traditional recommender retrieves top 100
→ LLM re-ranks top 10 or explains results

👉 Interview Answer

LLMs are usually not used to rank millions of items directly.

They are best used after candidate generation, for intent understanding, metadata enrichment, small-set re-ranking, explanations, and conversational recommendation.

1️⃣2️⃣ LLM-based Re-ranking

Why Re-rank with LLM?

LLMs can reason over user intent and item descriptions.

Flow

Top 20 candidates
+ User preference
+ Current query
→ LLM re-ranker
→ Final top 5

Best For

High-value recommendations
Complex user constraints
Natural-language preferences
Cold-start users
Explainable recommendations

Cost Control

Only use LLMs on small candidate sets.

👉 Interview Answer

LLM re-ranking can improve quality when recommendations depend on nuanced user intent.

But because LLMs are expensive, they should only re-rank a small candidate set after cheaper retrieval and ranking stages.

1️⃣3️⃣ Conversational Recommendation

Why Conversation Helps

Users often do not know exactly what they want.

The assistant can ask follow-up questions.

Example

User:
Recommend a laptop for AI work.

Assistant:
Do you care more about GPU performance,
battery life,
or portability?

Flow

User preference
→ Clarify constraints
→ Retrieve candidates
→ Rank
→ Explain recommendations

👉 Interview Answer

LLMs are especially useful for conversational recommendations.

They can clarify user intent, collect constraints, explain trade-offs, and turn vague preferences into structured recommendation filters.

1️⃣4️⃣ Cold Start

Cold-start Problems

New User

No behavior history.

New Item

No interaction history.

LLM Help

LLMs can use:

Natural-language user preferences
Item descriptions
Categories
Reviews
Similarity reasoning
Onboarding questions

👉 Interview Answer

LLMs can help with cold-start problems by using item descriptions, user-provided preferences, onboarding answers, and semantic reasoning before enough behavioral data exists.

1️⃣5️⃣ Feedback Loop

What Feedback to Collect

Impressions
Clicks
Likes
Dislikes
Purchases
Watch time
Add to cart
Skips
Dwell time
Explicit ratings

Feedback Flow

Recommendation shown
→ User interacts
→ Event logged
→ Feature store updated
→ Models retrained
→ Recommendations improve

👉 Interview Answer

Recommendation systems depend on feedback loops.

The system should log impressions, clicks, conversions, skips, ratings, and dwell time, then use these signals to update features and retrain models.

1️⃣6️⃣ Safety, Fairness, and Business Rules

Important Controls

Recommendations must respect:

User privacy
Age restrictions
Legal constraints
Inventory availability
Content safety
Fair exposure
Diversity
Avoiding harmful recommendations
Business rules

Example

Do not recommend unavailable products.

Do not recommend restricted content to underage users.

👉 Interview Answer

Recommendation systems need safety, fairness, and business-rule filters.

The final list should respect privacy, availability, age rules, legal constraints, content safety, diversity, and fairness goals.

1️⃣7️⃣ Evaluation Metrics

Offline Metrics

Precision@K
Recall@K
NDCG
MAP
Diversity
Coverage
Calibration

Online Metrics

Click-through rate
Conversion rate
Watch time
Revenue per session
Retention
User satisfaction
Long-term engagement

LLM-specific Metrics

Explanation quality
Constraint satisfaction
Hallucination rate
User trust

👉 Interview Answer

I would evaluate recommendation systems using both offline and online metrics.

Offline metrics include precision, recall, NDCG, diversity, and coverage.

Online metrics include CTR, conversion, retention, satisfaction, and long-term engagement.

1️⃣8️⃣ Common Failure Modes

Failure Modes

AI recommendation systems can fail because of:

Filter bubbles
Popularity bias
Bad cold-start handling
Stale preferences
Over-personalization
Hallucinated explanations
Irrelevant recommendations
Privacy leakage
Ignoring business constraints

Example

LLM explains that a product matches user preference,
but the product is out of stock.

👉 Interview Answer

Recommendation systems fail when they overfit to past behavior, ignore freshness, amplify popularity bias, violate constraints, or produce hallucinated explanations.

LLM-generated explanations must be grounded in real item attributes.

1️⃣9️⃣ Cost Control

Cost Drivers

LLM re-ranking
Embedding generation
Vector search
Feature computation
Real-time personalization
Frequent model updates
Long prompts

Controls

Use LLM only on top candidates
Cache item embeddings
Cache recommendation results
Precompute candidates
Use smaller models for explanations
Limit prompt size
Batch offline processing

👉 Interview Answer

Cost control is important because LLM-based recommendation can be expensive.

I would use traditional retrieval and ranking first, then apply LLMs only on small candidate sets or explanation generation.

2️⃣0️⃣ Best Practices

Practical Rules

Do not use LLMs to rank the full catalog
Use candidate generation first
Use ranking model before LLM re-ranking
Use LLMs for intent, explanations, and small-set reasoning
Ground explanations in item metadata
Collect explicit and implicit feedback
Add safety and business filters
Evaluate with online experiments
Monitor bias and diversity
Control cost with caching

Design Principle

Use traditional recommenders for scale.
Use LLMs for reasoning,
conversation,
and explanation.

👉 Interview Answer

A strong LLM recommendation system combines traditional recommender infrastructure with LLM reasoning.

Candidate generation and ranking handle scale.

LLMs help with intent understanding, conversational clarification, small-set re-ranking, cold start, and explanations.

🧠 Staff-Level Answer Final

👉 Interview Answer Full Version

To design an AI recommendation system with LLMs, I would not use the LLM as the entire recommendation engine.

A production recommender still needs scalable candidate generation, ranking, feature stores, user profiles, item catalogs, feedback loops, and business-rule filters.

The system starts by building user profiles from explicit preferences, click history, purchases, searches, ratings, and recent session behavior.

Items are represented using structured metadata, behavioral signals, tags, text descriptions, images, embeddings, freshness, availability, and popularity.

Candidate generation retrieves a manageable set of items from a large catalog using collaborative filtering, content-based retrieval, vector search, popularity, trending signals, graph relationships, and business campaigns.

A ranking model then scores candidates using user features, item features, context, behavioral signals, and business goals.

LLMs fit best after this scalable retrieval and ranking pipeline.

They can help understand natural-language intent, enrich item metadata, solve cold-start problems, ask clarifying questions, re-rank a small candidate set, and generate recommendation explanations.

LLMs should not rank millions of items directly because that would be too slow and too expensive.

For conversational recommendations, the LLM can turn vague user preferences into structured constraints, ask follow-up questions, and explain trade-offs.

For cold-start users or new items, the LLM can use natural-language preferences and item descriptions before enough interaction data exists.

The feedback loop is critical.

The system should collect impressions, clicks, skips, purchases, dwell time, ratings, and user feedback, then update features and retrain ranking models.

Safety and fairness filters should enforce privacy, availability, age restrictions, legal constraints, diversity, and business rules.

Evaluation should include offline metrics like precision@K, recall@K, NDCG, diversity, and coverage, plus online metrics like CTR, conversion, retention, satisfaction, and long-term engagement.

The core principle is: use traditional recommenders for scale, and use LLMs for reasoning, conversation, and explanation.

⭐ Final Insight

LLM Recommendation System 的核心不是：

“把用户信息和所有商品丢给 LLM”

真正的系统是：

User Modeling

Item Modeling

Candidate Generation

Ranking

LLM Re-ranking

Conversational Clarification

Explanation Generation

Feedback Loop

Safety Filters

Evaluation。

传统 recommender 负责 scale。

LLM 负责 reasoning、conversation 和 explanation。

最重要的一句话：

Use traditional recommenders for scale.

Use LLMs for reasoning, conversation, and explanation.

中文部分

🎯 Design an AI Recommendation System with LLMs

1️⃣ 核心框架

设计 AI Recommendation System with LLMs 时，我通常从这些方面分析：

Product requirements
User and item modeling
Candidate generation
Ranking and personalization
LLM-based reasoning layer
Feedback loop
Safety and fairness
核心权衡：relevance vs latency vs cost

2️⃣ Product Goal

AI recommendation system 向用户推荐相关 items。

Examples:

Products
Videos
Articles
Courses
Jobs
Restaurants
Games
Financial products
Internal documents

Basic Flow

User Context
→ Candidate Retrieval
→ Ranking
→ LLM Reasoning / Explanation
→ Personalized Recommendations
→ Feedback Loop

👉 面试回答

AI recommendation system 使用 user behavior、 item metadata、embeddings、ranking models，有时还使用 LLM reasoning 来推荐 relevant items。

LLM 通常不是整个 recommender。

它更多用于 intent understanding、 metadata enrichment、explanation、或 small candidate set 的 re-ranking。

3️⃣ Functional Requirements

Core Features

系统应该支持：

Personalized recommendations
Similar item recommendations
Search-based recommendations
Trending recommendations
Cold-start recommendations
Explanation for recommendations
User feedback
Filtering and constraints

Examples

Recommend products similar to this item.

Recommend videos based on watch history.

Recommend jobs based on resume and preferences.

Recommend documents based on current task.

👉 面试回答

核心需求包括 personalized recommendations、 similar-item recommendations、 cold-start handling、ranking、 feedback collection、filtering 和 recommendation explanations。

4️⃣ Non-functional Requirements

Important System Qualities

系统应该优化：

Relevance
Low latency
Scalability
Freshness
Diversity
Fairness
Explainability
Cost efficiency
Privacy

Key Trade-off

More personalization and LLM reasoning
→ Better quality

But also
→ Higher latency and cost

👉 面试回答

Non-functional requirements 包括 relevance、 latency、scalability、freshness、 diversity、fairness、explainability、 privacy 和 cost efficiency。

核心权衡是 recommendation quality 和 latency / cost。

5️⃣ High-Level Architecture

Architecture

Client
→ Recommendation API
→ User Profile Service
→ Candidate Generation
→ Feature Store
→ Ranking Service
→ LLM Reasoning / Explanation Service
→ Business Rules / Safety Filters
→ Response
→ Feedback Pipeline

Core Components

User Profile Service

存储 user preferences 和 behavior。

Candidate Generation

寻找可能推荐的 items。

Ranking Service

为 candidates 打分排序。

LLM Layer

解释、re-rank 或 personalize recommendations。

Feedback Pipeline

收集 clicks、views、purchases、 likes、skips 和 conversions。

👉 面试回答

Production recommendation system 通常包括 user profiles、item catalogs、 candidate generation、feature stores、 ranking models、business filters、 optional LLM reasoning layer 和 feedback pipeline。

6️⃣ Data Model

Main Entities

User
Item
UserEvent
RecommendationRequest
CandidateItem
RankedItem
FeedbackEvent

User Profile Example

{
  "user_id": "user_123",
  "interests": ["machine learning", "system design"],
  "recent_views": ["item_1", "item_2"],
  "purchased_items": ["item_3"],
  "negative_feedback": ["item_9"]
}

Item Metadata Example

{
  "item_id": "item_456",
  "title": "Distributed Systems Course",
  "category": "education",
  "tags": ["backend", "scalability", "architecture"],
  "embedding": [0.12, -0.44, 0.88]
}

👉 面试回答

我会建模 users、items、user events、 candidate items、ranked items 和 feedback events。

User profiles 捕捉 preferences 和 behavior， item metadata 捕捉 categories、tags、 text、images、embeddings 和 business attributes。

7️⃣ User Modeling

User Profile 包括什么？

User profile 可能包括：

Explicit preferences
Click history
View history
Purchase history
Search history
Likes and dislikes
Recent session behavior
Long-term interests
Negative feedback

Short-term vs Long-term Preference

Short-term:
What user is doing now.

Long-term:
What user generally likes.

👉 面试回答

User modeling 结合 long-term preferences 和 short-term session intent。

Long-term profile 捕捉 stable interests， short-term behavior 捕捉用户现在想要什么。

8️⃣ Item Modeling

Item Features

Items 可以用这些表示：

Title
Description
Category
Tags
Price
Popularity
Freshness
Ratings
Text embeddings
Image embeddings
Availability
Business constraints

LLM Role in Item Modeling

LLMs 可以帮助生成：

Tags
Summaries
Categories
Attribute extraction
Natural-language descriptions
Similarity explanations

👉 面试回答

Item modeling 结合 structured metadata、 behavioral signals、embeddings 和 generated attributes。

LLMs 很适合 enrich item metadata、 extract attributes、summarize content 和 improve semantic matching。

9️⃣ Candidate Generation

为什么 Candidate Generation 重要？

系统可能有 millions 或 billions of items。

不能深度排序所有 items。

Candidate Sources

Collaborative filtering
Content-based retrieval
Vector search
Trending items
Popular items
Recently viewed similar items
Search query matches
Business campaigns
Graph-based recommendations

Flow

User Profile
→ Retrieve 1,000 candidates
→ Pass to ranking stage

👉 面试回答

Candidate generation 从大规模 catalog 中检索较小的一组 potentially relevant items。

它使用 collaborative filtering、 content-based retrieval、vector search、 popularity、trending signals 和 business rules。

🔟 Ranking

Ranking Goal

Ranking 按 expected user value 排序 candidates。

Ranking Signals

User-item similarity
Click probability
Purchase probability
Watch time
Rating prediction
Freshness
Diversity
Price sensitivity
User intent
Business priority

Ranking Flow

Candidates
→ Feature Enrichment
→ Ranking Model
→ Ranked List
→ Filters
→ Final Recommendations

👉 面试回答

Ranking 使用 user features、item features、 behavioral signals、contextual signals 和 business rules 给 candidate items 打分。

Ranker 通常优化 predicted relevance 或 conversion。

1️⃣1️⃣ Where LLMs Fit

LLMs 通常不是 First-stage Ranker

LLMs 很昂贵。

它们更适合：

Query understanding
User intent interpretation
Item metadata enrichment
Explanation generation
Re-ranking top candidates
Cold-start reasoning
Conversational recommendations

Common Pattern

Traditional recommender retrieves top 100
→ LLM re-ranks top 10 or explains results

👉 面试回答

LLMs 通常不用于直接排序 millions of items。

它们最适合在 candidate generation 后，用于 intent understanding、 metadata enrichment、small-set re-ranking、 explanations 和 conversational recommendation。

1️⃣2️⃣ LLM-based Re-ranking

为什么用 LLM Re-rank？

LLMs 可以理解 user intent 和 item descriptions。

Flow

Top 20 candidates
+ User preference
+ Current query
→ LLM re-ranker
→ Final top 5

Best For

High-value recommendations
Complex user constraints
Natural-language preferences
Cold-start users
Explainable recommendations

Cost Control

只在 small candidate sets 上使用 LLMs。

👉 面试回答

当 recommendations 依赖 nuanced user intent 时， LLM re-ranking 可以提升质量。

但因为 LLMs 昂贵，应只在 cheaper retrieval 和 ranking 后的小 candidate set 上使用。

1️⃣3️⃣ Conversational Recommendation

为什么 Conversation 有帮助？

Users 经常不完全知道自己想要什么。

Assistant 可以问 follow-up questions。

Example

User:
Recommend a laptop for AI work.

Assistant:
Do you care more about GPU performance,
battery life,
or portability?

Flow

User preference
→ Clarify constraints
→ Retrieve candidates
→ Rank
→ Explain recommendations

👉 面试回答

LLMs 对 conversational recommendations 特别有用。

它们可以 clarify user intent、收集 constraints、解释 trade-offs，并把 vague preferences 转换成 structured recommendation filters。

1️⃣4️⃣ Cold Start

Cold-start Problems

New User

没有 behavior history。

New Item

没有 interaction history。

LLM Help

LLMs 可以使用：

Natural-language user preferences
Item descriptions
Categories
Reviews
Similarity reasoning
Onboarding questions

👉 面试回答

LLMs 可以通过 item descriptions、 user-provided preferences、onboarding answers 和 semantic reasoning 帮助解决 cold-start problems，在有足够 behavioral data 前提供推荐。

1️⃣5️⃣ Feedback Loop

需要收集哪些 Feedback？

Impressions
Clicks
Likes
Dislikes
Purchases
Watch time
Add to cart
Skips
Dwell time
Explicit ratings

Feedback Flow

Recommendation shown
→ User interacts
→ Event logged
→ Feature store updated
→ Models retrained
→ Recommendations improve

👉 面试回答

Recommendation systems 依赖 feedback loops。

系统应该记录 impressions、clicks、 conversions、skips、ratings 和 dwell time，并用这些 signals 更新 features 和 retrain models。

1️⃣6️⃣ Safety, Fairness, and Business Rules

Important Controls

Recommendations 必须遵守：

User privacy
Age restrictions
Legal constraints
Inventory availability
Content safety
Fair exposure
Diversity
Avoiding harmful recommendations
Business rules

Example

Do not recommend unavailable products.

Do not recommend restricted content to underage users.

👉 面试回答

Recommendation systems 需要 safety、fairness 和 business-rule filters。

Final list 应该遵守 privacy、availability、 age rules、legal constraints、 content safety、diversity 和 fairness goals。

1️⃣7️⃣ Evaluation Metrics

Offline Metrics

Precision@K
Recall@K
NDCG
MAP
Diversity
Coverage
Calibration

Online Metrics

Click-through rate
Conversion rate
Watch time
Revenue per session
Retention
User satisfaction
Long-term engagement

LLM-specific Metrics

Explanation quality
Constraint satisfaction
Hallucination rate
User trust

👉 面试回答

我会用 offline 和 online metrics 同时评估 recommendation systems。

Offline metrics 包括 precision、recall、 NDCG、diversity 和 coverage。

Online metrics 包括 CTR、conversion、 retention、satisfaction 和 long-term engagement。

1️⃣8️⃣ Common Failure Modes

Failure Modes

AI recommendation systems 可能失败因为：

Filter bubbles
Popularity bias
Bad cold-start handling
Stale preferences
Over-personalization
Hallucinated explanations
Irrelevant recommendations
Privacy leakage
Ignoring business constraints

Example

LLM explains that a product matches user preference,
but the product is out of stock.

👉 面试回答

Recommendation systems 可能因为 overfit past behavior、 ignore freshness、放大 popularity bias、违反 constraints 或产生 hallucinated explanations 而失败。

LLM-generated explanations 必须 grounded in real item attributes。

1️⃣9️⃣ Cost Control

Cost Drivers

LLM re-ranking
Embedding generation
Vector search
Feature computation
Real-time personalization
Frequent model updates
Long prompts

Controls

Use LLM only on top candidates
Cache item embeddings
Cache recommendation results
Precompute candidates
Use smaller models for explanations
Limit prompt size
Batch offline processing

👉 面试回答

Cost control 很重要，因为 LLM-based recommendation 可能很昂贵。

我会先使用 traditional retrieval 和 ranking，只在 small candidate sets 或 explanation generation 中使用 LLMs。

2️⃣0️⃣ Best Practices

Practical Rules

Do not use LLMs to rank the full catalog
Use candidate generation first
Use ranking model before LLM re-ranking
Use LLMs for intent, explanations, and small-set reasoning
Ground explanations in item metadata
Collect explicit and implicit feedback
Add safety and business filters
Evaluate with online experiments
Monitor bias and diversity
Control cost with caching

Design Principle

Use traditional recommenders for scale.
Use LLMs for reasoning,
conversation,
and explanation.

👉 面试回答

Strong LLM recommendation system 结合 traditional recommender infrastructure 和 LLM reasoning。

Candidate generation 和 ranking 负责 scale。

LLMs 帮助 intent understanding、 conversational clarification、small-set re-ranking、 cold start 和 explanations。

🧠 Staff-Level Answer Final

👉 面试回答完整版本

设计 AI recommendation system with LLMs，我不会把 LLM 当成整个 recommendation engine。

Production recommender 仍然需要 scalable candidate generation、 ranking、feature stores、user profiles、 item catalogs、feedback loops 和 business-rule filters。

系统首先从 explicit preferences、click history、 purchases、searches、ratings 和 recent session behavior 构建 user profiles。

Items 使用 structured metadata、 behavioral signals、tags、text descriptions、 images、embeddings、freshness、 availability 和 popularity 表示。

Candidate generation 通过 collaborative filtering、 content-based retrieval、vector search、 popularity、trending signals、 graph relationships 和 business campaigns 从大 catalog 中检索 manageable set。

Ranking model 再使用 user features、 item features、context、behavioral signals 和 business goals 给 candidates 打分。

LLMs 最适合放在 scalable retrieval 和 ranking pipeline 之后。

它们可以帮助 understand natural-language intent、 enrich item metadata、解决 cold-start problems、 ask clarifying questions、 re-rank small candidate set 和 generate recommendation explanations。

LLMs 不应该直接 rank millions of items，因为太慢且太贵。

对 conversational recommendations， LLM 可以把 vague user preferences 转换成 structured constraints，询问 follow-up questions，并解释 trade-offs。

对 cold-start users 或 new items， LLM 可以在缺少 interaction data 时，使用 natural-language preferences 和 item descriptions。

Feedback loop 非常关键。

系统应该收集 impressions、clicks、skips、 purchases、dwell time、ratings 和 user feedback，然后更新 features 并 retrain ranking models。

Safety 和 fairness filters 应该执行 privacy、availability、 age restrictions、legal constraints、 diversity 和 business rules。

Evaluation 应包含 offline metrics，比如 precision@K、recall@K、NDCG、 diversity 和 coverage，也包含 online metrics，比如 CTR、conversion、retention、 satisfaction 和 long-term engagement。

核心原则是： use traditional recommenders for scale， and use LLMs for reasoning、 conversation 和 explanation。

⭐ Final Insight

LLM Recommendation System 的核心不是：

“把用户信息和所有商品丢给 LLM”

真正的系统是：

User Modeling

Item Modeling

Candidate Generation

Ranking

LLM Re-ranking

Conversational Clarification

Explanation Generation

Feedback Loop

Safety Filters

Evaluation。

传统 recommender 负责 scale。

LLM 负责 reasoning、conversation 和 explanation。

最重要的一句话：

Use traditional recommenders for scale.

Use LLMs for reasoning, conversation, and explanation.