🎯 Design an AI Recommendation System with LLMs
1️⃣ Core Framework
When designing an AI Recommendation System with LLMs, I frame it as:
- Product requirements
- User and item modeling
- Candidate generation
- Ranking and personalization
- LLM-based reasoning layer
- Feedback loop
- Safety and fairness
- Trade-offs: relevance vs latency vs cost
2️⃣ Product Goal
An AI recommendation system suggests relevant items to users.
Examples:
- Products
- Videos
- Articles
- Courses
- Jobs
- Restaurants
- Games
- Financial products
- Internal documents
Basic Flow
User Context
→ Candidate Retrieval
→ Ranking
→ LLM Reasoning / Explanation
→ Personalized Recommendations
→ Feedback Loop
👉 Interview Answer
An AI recommendation system uses user behavior, item metadata, embeddings, ranking models, and sometimes LLM reasoning to recommend relevant items.
The LLM is usually not the whole recommender.
It is often used for understanding intent, enriching metadata, explaining recommendations, or re-ranking small candidate sets.
3️⃣ Functional Requirements
Core Features
The system should support:
- Personalized recommendations
- Similar item recommendations
- Search-based recommendations
- Trending recommendations
- Cold-start recommendations
- Explanation for recommendations
- User feedback
- Filtering and constraints
Examples
Recommend products similar to this item.
Recommend videos based on watch history.
Recommend jobs based on resume and preferences.
Recommend documents based on current task.
👉 Interview Answer
Core requirements include personalized recommendations, similar-item recommendations, cold-start handling, ranking, feedback collection, filtering, and recommendation explanations.
4️⃣ Non-functional Requirements
Important System Qualities
The system should optimize for:
- Relevance
- Low latency
- Scalability
- Freshness
- Diversity
- Fairness
- Explainability
- Cost efficiency
- Privacy
Key Trade-off
More personalization and LLM reasoning
→ Better quality
But also
→ Higher latency and cost
👉 Interview Answer
Non-functional requirements include relevance, latency, scalability, freshness, diversity, fairness, explainability, privacy, and cost efficiency.
The core trade-off is recommendation quality versus latency and cost.
5️⃣ High-Level Architecture
Architecture
Client
→ Recommendation API
→ User Profile Service
→ Candidate Generation
→ Feature Store
→ Ranking Service
→ LLM Reasoning / Explanation Service
→ Business Rules / Safety Filters
→ Response
→ Feedback Pipeline
Core Components
User Profile Service
Stores user preferences and behavior.
Candidate Generation
Finds possible recommendation items.
Ranking Service
Scores and orders candidates.
LLM Layer
Explains, re-ranks, or personalizes recommendations.
Feedback Pipeline
Collects clicks, views, purchases, likes, skips, and conversions.
👉 Interview Answer
A production recommendation system usually includes user profiles, item catalogs, candidate generation, feature stores, ranking models, business filters, an optional LLM reasoning layer, and a feedback pipeline.
6️⃣ Data Model
Main Entities
User
Item
UserEvent
RecommendationRequest
CandidateItem
RankedItem
FeedbackEvent
User Profile Example
{
"user_id": "user_123",
"interests": ["machine learning", "system design"],
"recent_views": ["item_1", "item_2"],
"purchased_items": ["item_3"],
"negative_feedback": ["item_9"]
}
Item Metadata Example
{
"item_id": "item_456",
"title": "Distributed Systems Course",
"category": "education",
"tags": ["backend", "scalability", "architecture"],
"embedding": [0.12, -0.44, 0.88]
}
👉 Interview Answer
I would model users, items, user events, candidate items, ranked items, and feedback events.
User profiles capture preferences and behavior, while item metadata captures categories, tags, text, images, embeddings, and business attributes.
7️⃣ User Modeling
What User Profile Includes
A user profile may include:
- Explicit preferences
- Click history
- View history
- Purchase history
- Search history
- Likes and dislikes
- Recent session behavior
- Long-term interests
- Negative feedback
Short-term vs Long-term Preference
Short-term:
What user is doing now.
Long-term:
What user generally likes.
👉 Interview Answer
User modeling combines long-term preferences and short-term session intent.
Long-term profile captures stable interests, while short-term behavior captures what the user wants right now.
8️⃣ Item Modeling
Item Features
Items can be represented using:
- Title
- Description
- Category
- Tags
- Price
- Popularity
- Freshness
- Ratings
- Text embeddings
- Image embeddings
- Availability
- Business constraints
LLM Role in Item Modeling
LLMs can help generate:
- Tags
- Summaries
- Categories
- Attribute extraction
- Natural-language descriptions
- Similarity explanations
👉 Interview Answer
Item modeling combines structured metadata, behavioral signals, embeddings, and generated attributes.
LLMs are useful for enriching item metadata, extracting attributes, summarizing content, and improving semantic matching.
9️⃣ Candidate Generation
Why Candidate Generation Matters
The system may have millions or billions of items.
It cannot rank everything deeply.
Candidate Sources
- Collaborative filtering
- Content-based retrieval
- Vector search
- Trending items
- Popular items
- Recently viewed similar items
- Search query matches
- Business campaigns
- Graph-based recommendations
Flow
User Profile
→ Retrieve 1,000 candidates
→ Pass to ranking stage
👉 Interview Answer
Candidate generation retrieves a smaller set of potentially relevant items from a large catalog.
It uses collaborative filtering, content-based retrieval, vector search, popularity, trending signals, and business rules.
🔟 Ranking
Ranking Goal
Ranking orders candidates by expected user value.
Ranking Signals
- User-item similarity
- Click probability
- Purchase probability
- Watch time
- Rating prediction
- Freshness
- Diversity
- Price sensitivity
- User intent
- Business priority
Ranking Flow
Candidates
→ Feature Enrichment
→ Ranking Model
→ Ranked List
→ Filters
→ Final Recommendations
👉 Interview Answer
Ranking scores candidate items using user features, item features, behavioral signals, contextual signals, and business rules.
The ranker optimizes for predicted relevance or conversion.
1️⃣1️⃣ Where LLMs Fit
LLMs Are Not Usually the First-stage Ranker
LLMs are expensive.
They are better used for:
- Query understanding
- User intent interpretation
- Item metadata enrichment
- Explanation generation
- Re-ranking top candidates
- Cold-start reasoning
- Conversational recommendations
Common Pattern
Traditional recommender retrieves top 100
→ LLM re-ranks top 10 or explains results
👉 Interview Answer
LLMs are usually not used to rank millions of items directly.
They are best used after candidate generation, for intent understanding, metadata enrichment, small-set re-ranking, explanations, and conversational recommendation.
1️⃣2️⃣ LLM-based Re-ranking
Why Re-rank with LLM?
LLMs can reason over user intent and item descriptions.
Flow
Top 20 candidates
+ User preference
+ Current query
→ LLM re-ranker
→ Final top 5
Best For
- High-value recommendations
- Complex user constraints
- Natural-language preferences
- Cold-start users
- Explainable recommendations
Cost Control
Only use LLMs on small candidate sets.
👉 Interview Answer
LLM re-ranking can improve quality when recommendations depend on nuanced user intent.
But because LLMs are expensive, they should only re-rank a small candidate set after cheaper retrieval and ranking stages.
1️⃣3️⃣ Conversational Recommendation
Why Conversation Helps
Users often do not know exactly what they want.
The assistant can ask follow-up questions.
Example
User:
Recommend a laptop for AI work.
Assistant:
Do you care more about GPU performance,
battery life,
or portability?
Flow
User preference
→ Clarify constraints
→ Retrieve candidates
→ Rank
→ Explain recommendations
👉 Interview Answer
LLMs are especially useful for conversational recommendations.
They can clarify user intent, collect constraints, explain trade-offs, and turn vague preferences into structured recommendation filters.
1️⃣4️⃣ Cold Start
Cold-start Problems
New User
No behavior history.
New Item
No interaction history.
LLM Help
LLMs can use:
- Natural-language user preferences
- Item descriptions
- Categories
- Reviews
- Similarity reasoning
- Onboarding questions
👉 Interview Answer
LLMs can help with cold-start problems by using item descriptions, user-provided preferences, onboarding answers, and semantic reasoning before enough behavioral data exists.
1️⃣5️⃣ Feedback Loop
What Feedback to Collect
- Impressions
- Clicks
- Likes
- Dislikes
- Purchases
- Watch time
- Add to cart
- Skips
- Dwell time
- Explicit ratings
Feedback Flow
Recommendation shown
→ User interacts
→ Event logged
→ Feature store updated
→ Models retrained
→ Recommendations improve
👉 Interview Answer
Recommendation systems depend on feedback loops.
The system should log impressions, clicks, conversions, skips, ratings, and dwell time, then use these signals to update features and retrain models.
1️⃣6️⃣ Safety, Fairness, and Business Rules
Important Controls
Recommendations must respect:
- User privacy
- Age restrictions
- Legal constraints
- Inventory availability
- Content safety
- Fair exposure
- Diversity
- Avoiding harmful recommendations
- Business rules
Example
Do not recommend unavailable products.
Do not recommend restricted content to underage users.
👉 Interview Answer
Recommendation systems need safety, fairness, and business-rule filters.
The final list should respect privacy, availability, age rules, legal constraints, content safety, diversity, and fairness goals.
1️⃣7️⃣ Evaluation Metrics
Offline Metrics
- Precision@K
- Recall@K
- NDCG
- MAP
- Diversity
- Coverage
- Calibration
Online Metrics
- Click-through rate
- Conversion rate
- Watch time
- Revenue per session
- Retention
- User satisfaction
- Long-term engagement
LLM-specific Metrics
- Explanation quality
- Constraint satisfaction
- Hallucination rate
- User trust
👉 Interview Answer
I would evaluate recommendation systems using both offline and online metrics.
Offline metrics include precision, recall, NDCG, diversity, and coverage.
Online metrics include CTR, conversion, retention, satisfaction, and long-term engagement.
1️⃣8️⃣ Common Failure Modes
Failure Modes
AI recommendation systems can fail because of:
- Filter bubbles
- Popularity bias
- Bad cold-start handling
- Stale preferences
- Over-personalization
- Hallucinated explanations
- Irrelevant recommendations
- Privacy leakage
- Ignoring business constraints
Example
LLM explains that a product matches user preference,
but the product is out of stock.
👉 Interview Answer
Recommendation systems fail when they overfit to past behavior, ignore freshness, amplify popularity bias, violate constraints, or produce hallucinated explanations.
LLM-generated explanations must be grounded in real item attributes.
1️⃣9️⃣ Cost Control
Cost Drivers
- LLM re-ranking
- Embedding generation
- Vector search
- Feature computation
- Real-time personalization
- Frequent model updates
- Long prompts
Controls
- Use LLM only on top candidates
- Cache item embeddings
- Cache recommendation results
- Precompute candidates
- Use smaller models for explanations
- Limit prompt size
- Batch offline processing
👉 Interview Answer
Cost control is important because LLM-based recommendation can be expensive.
I would use traditional retrieval and ranking first, then apply LLMs only on small candidate sets or explanation generation.
2️⃣0️⃣ Best Practices
Practical Rules
- Do not use LLMs to rank the full catalog
- Use candidate generation first
- Use ranking model before LLM re-ranking
- Use LLMs for intent, explanations, and small-set reasoning
- Ground explanations in item metadata
- Collect explicit and implicit feedback
- Add safety and business filters
- Evaluate with online experiments
- Monitor bias and diversity
- Control cost with caching
Design Principle
Use traditional recommenders for scale.
Use LLMs for reasoning,
conversation,
and explanation.
👉 Interview Answer
A strong LLM recommendation system combines traditional recommender infrastructure with LLM reasoning.
Candidate generation and ranking handle scale.
LLMs help with intent understanding, conversational clarification, small-set re-ranking, cold start, and explanations.
🧠 Staff-Level Answer Final
👉 Interview Answer Full Version
To design an AI recommendation system with LLMs, I would not use the LLM as the entire recommendation engine.
A production recommender still needs scalable candidate generation, ranking, feature stores, user profiles, item catalogs, feedback loops, and business-rule filters.
The system starts by building user profiles from explicit preferences, click history, purchases, searches, ratings, and recent session behavior.
Items are represented using structured metadata, behavioral signals, tags, text descriptions, images, embeddings, freshness, availability, and popularity.
Candidate generation retrieves a manageable set of items from a large catalog using collaborative filtering, content-based retrieval, vector search, popularity, trending signals, graph relationships, and business campaigns.
A ranking model then scores candidates using user features, item features, context, behavioral signals, and business goals.
LLMs fit best after this scalable retrieval and ranking pipeline.
They can help understand natural-language intent, enrich item metadata, solve cold-start problems, ask clarifying questions, re-rank a small candidate set, and generate recommendation explanations.
LLMs should not rank millions of items directly because that would be too slow and too expensive.
For conversational recommendations, the LLM can turn vague user preferences into structured constraints, ask follow-up questions, and explain trade-offs.
For cold-start users or new items, the LLM can use natural-language preferences and item descriptions before enough interaction data exists.
The feedback loop is critical.
The system should collect impressions, clicks, skips, purchases, dwell time, ratings, and user feedback, then update features and retrain ranking models.
Safety and fairness filters should enforce privacy, availability, age restrictions, legal constraints, diversity, and business rules.
Evaluation should include offline metrics like precision@K, recall@K, NDCG, diversity, and coverage, plus online metrics like CTR, conversion, retention, satisfaction, and long-term engagement.
The core principle is: use traditional recommenders for scale, and use LLMs for reasoning, conversation, and explanation.
⭐ Final Insight
LLM Recommendation System 的核心不是:
“把用户信息和所有商品丢给 LLM”
真正的系统是:
User Modeling
- Item Modeling
- Candidate Generation
- Ranking
- LLM Re-ranking
- Conversational Clarification
- Explanation Generation
- Feedback Loop
- Safety Filters
- Evaluation。
传统 recommender 负责 scale。
LLM 负责 reasoning、conversation 和 explanation。
最重要的一句话:
Use traditional recommenders for scale.
Use LLMs for reasoning, conversation, and explanation.
中文部分
🎯 Design an AI Recommendation System with LLMs
1️⃣ 核心框架
设计 AI Recommendation System with LLMs 时,我通常从这些方面分析:
- Product requirements
- User and item modeling
- Candidate generation
- Ranking and personalization
- LLM-based reasoning layer
- Feedback loop
- Safety and fairness
- 核心权衡:relevance vs latency vs cost
2️⃣ Product Goal
AI recommendation system 向用户推荐相关 items。
Examples:
- Products
- Videos
- Articles
- Courses
- Jobs
- Restaurants
- Games
- Financial products
- Internal documents
Basic Flow
User Context
→ Candidate Retrieval
→ Ranking
→ LLM Reasoning / Explanation
→ Personalized Recommendations
→ Feedback Loop
👉 面试回答
AI recommendation system 使用 user behavior、 item metadata、embeddings、ranking models, 有时还使用 LLM reasoning 来推荐 relevant items。
LLM 通常不是整个 recommender。
它更多用于 intent understanding、 metadata enrichment、explanation、 或 small candidate set 的 re-ranking。
3️⃣ Functional Requirements
Core Features
系统应该支持:
- Personalized recommendations
- Similar item recommendations
- Search-based recommendations
- Trending recommendations
- Cold-start recommendations
- Explanation for recommendations
- User feedback
- Filtering and constraints
Examples
Recommend products similar to this item.
Recommend videos based on watch history.
Recommend jobs based on resume and preferences.
Recommend documents based on current task.
👉 面试回答
核心需求包括 personalized recommendations、 similar-item recommendations、 cold-start handling、ranking、 feedback collection、filtering 和 recommendation explanations。
4️⃣ Non-functional Requirements
Important System Qualities
系统应该优化:
- Relevance
- Low latency
- Scalability
- Freshness
- Diversity
- Fairness
- Explainability
- Cost efficiency
- Privacy
Key Trade-off
More personalization and LLM reasoning
→ Better quality
But also
→ Higher latency and cost
👉 面试回答
Non-functional requirements 包括 relevance、 latency、scalability、freshness、 diversity、fairness、explainability、 privacy 和 cost efficiency。
核心权衡是 recommendation quality 和 latency / cost。
5️⃣ High-Level Architecture
Architecture
Client
→ Recommendation API
→ User Profile Service
→ Candidate Generation
→ Feature Store
→ Ranking Service
→ LLM Reasoning / Explanation Service
→ Business Rules / Safety Filters
→ Response
→ Feedback Pipeline
Core Components
User Profile Service
存储 user preferences 和 behavior。
Candidate Generation
寻找可能推荐的 items。
Ranking Service
为 candidates 打分排序。
LLM Layer
解释、re-rank 或 personalize recommendations。
Feedback Pipeline
收集 clicks、views、purchases、 likes、skips 和 conversions。
👉 面试回答
Production recommendation system 通常包括 user profiles、item catalogs、 candidate generation、feature stores、 ranking models、business filters、 optional LLM reasoning layer 和 feedback pipeline。
6️⃣ Data Model
Main Entities
User
Item
UserEvent
RecommendationRequest
CandidateItem
RankedItem
FeedbackEvent
User Profile Example
{
"user_id": "user_123",
"interests": ["machine learning", "system design"],
"recent_views": ["item_1", "item_2"],
"purchased_items": ["item_3"],
"negative_feedback": ["item_9"]
}
Item Metadata Example
{
"item_id": "item_456",
"title": "Distributed Systems Course",
"category": "education",
"tags": ["backend", "scalability", "architecture"],
"embedding": [0.12, -0.44, 0.88]
}
👉 面试回答
我会建模 users、items、user events、 candidate items、ranked items 和 feedback events。
User profiles 捕捉 preferences 和 behavior, item metadata 捕捉 categories、tags、 text、images、embeddings 和 business attributes。
7️⃣ User Modeling
User Profile 包括什么?
User profile 可能包括:
- Explicit preferences
- Click history
- View history
- Purchase history
- Search history
- Likes and dislikes
- Recent session behavior
- Long-term interests
- Negative feedback
Short-term vs Long-term Preference
Short-term:
What user is doing now.
Long-term:
What user generally likes.
👉 面试回答
User modeling 结合 long-term preferences 和 short-term session intent。
Long-term profile 捕捉 stable interests, short-term behavior 捕捉用户现在想要什么。
8️⃣ Item Modeling
Item Features
Items 可以用这些表示:
- Title
- Description
- Category
- Tags
- Price
- Popularity
- Freshness
- Ratings
- Text embeddings
- Image embeddings
- Availability
- Business constraints
LLM Role in Item Modeling
LLMs 可以帮助生成:
- Tags
- Summaries
- Categories
- Attribute extraction
- Natural-language descriptions
- Similarity explanations
👉 面试回答
Item modeling 结合 structured metadata、 behavioral signals、embeddings 和 generated attributes。
LLMs 很适合 enrich item metadata、 extract attributes、summarize content 和 improve semantic matching。
9️⃣ Candidate Generation
为什么 Candidate Generation 重要?
系统可能有 millions 或 billions of items。
不能深度排序所有 items。
Candidate Sources
- Collaborative filtering
- Content-based retrieval
- Vector search
- Trending items
- Popular items
- Recently viewed similar items
- Search query matches
- Business campaigns
- Graph-based recommendations
Flow
User Profile
→ Retrieve 1,000 candidates
→ Pass to ranking stage
👉 面试回答
Candidate generation 从大规模 catalog 中检索较小的一组 potentially relevant items。
它使用 collaborative filtering、 content-based retrieval、vector search、 popularity、trending signals 和 business rules。
🔟 Ranking
Ranking Goal
Ranking 按 expected user value 排序 candidates。
Ranking Signals
- User-item similarity
- Click probability
- Purchase probability
- Watch time
- Rating prediction
- Freshness
- Diversity
- Price sensitivity
- User intent
- Business priority
Ranking Flow
Candidates
→ Feature Enrichment
→ Ranking Model
→ Ranked List
→ Filters
→ Final Recommendations
👉 面试回答
Ranking 使用 user features、item features、 behavioral signals、contextual signals 和 business rules 给 candidate items 打分。
Ranker 通常优化 predicted relevance 或 conversion。
1️⃣1️⃣ Where LLMs Fit
LLMs 通常不是 First-stage Ranker
LLMs 很昂贵。
它们更适合:
- Query understanding
- User intent interpretation
- Item metadata enrichment
- Explanation generation
- Re-ranking top candidates
- Cold-start reasoning
- Conversational recommendations
Common Pattern
Traditional recommender retrieves top 100
→ LLM re-ranks top 10 or explains results
👉 面试回答
LLMs 通常不用于直接排序 millions of items。
它们最适合在 candidate generation 后, 用于 intent understanding、 metadata enrichment、small-set re-ranking、 explanations 和 conversational recommendation。
1️⃣2️⃣ LLM-based Re-ranking
为什么用 LLM Re-rank?
LLMs 可以理解 user intent 和 item descriptions。
Flow
Top 20 candidates
+ User preference
+ Current query
→ LLM re-ranker
→ Final top 5
Best For
- High-value recommendations
- Complex user constraints
- Natural-language preferences
- Cold-start users
- Explainable recommendations
Cost Control
只在 small candidate sets 上使用 LLMs。
👉 面试回答
当 recommendations 依赖 nuanced user intent 时, LLM re-ranking 可以提升质量。
但因为 LLMs 昂贵, 应只在 cheaper retrieval 和 ranking 后的小 candidate set 上使用。
1️⃣3️⃣ Conversational Recommendation
为什么 Conversation 有帮助?
Users 经常不完全知道自己想要什么。
Assistant 可以问 follow-up questions。
Example
User:
Recommend a laptop for AI work.
Assistant:
Do you care more about GPU performance,
battery life,
or portability?
Flow
User preference
→ Clarify constraints
→ Retrieve candidates
→ Rank
→ Explain recommendations
👉 面试回答
LLMs 对 conversational recommendations 特别有用。
它们可以 clarify user intent、 收集 constraints、解释 trade-offs, 并把 vague preferences 转换成 structured recommendation filters。
1️⃣4️⃣ Cold Start
Cold-start Problems
New User
没有 behavior history。
New Item
没有 interaction history。
LLM Help
LLMs 可以使用:
- Natural-language user preferences
- Item descriptions
- Categories
- Reviews
- Similarity reasoning
- Onboarding questions
👉 面试回答
LLMs 可以通过 item descriptions、 user-provided preferences、onboarding answers 和 semantic reasoning 帮助解决 cold-start problems, 在有足够 behavioral data 前提供推荐。
1️⃣5️⃣ Feedback Loop
需要收集哪些 Feedback?
- Impressions
- Clicks
- Likes
- Dislikes
- Purchases
- Watch time
- Add to cart
- Skips
- Dwell time
- Explicit ratings
Feedback Flow
Recommendation shown
→ User interacts
→ Event logged
→ Feature store updated
→ Models retrained
→ Recommendations improve
👉 面试回答
Recommendation systems 依赖 feedback loops。
系统应该记录 impressions、clicks、 conversions、skips、ratings 和 dwell time, 并用这些 signals 更新 features 和 retrain models。
1️⃣6️⃣ Safety, Fairness, and Business Rules
Important Controls
Recommendations 必须遵守:
- User privacy
- Age restrictions
- Legal constraints
- Inventory availability
- Content safety
- Fair exposure
- Diversity
- Avoiding harmful recommendations
- Business rules
Example
Do not recommend unavailable products.
Do not recommend restricted content to underage users.
👉 面试回答
Recommendation systems 需要 safety、fairness 和 business-rule filters。
Final list 应该遵守 privacy、availability、 age rules、legal constraints、 content safety、diversity 和 fairness goals。
1️⃣7️⃣ Evaluation Metrics
Offline Metrics
- Precision@K
- Recall@K
- NDCG
- MAP
- Diversity
- Coverage
- Calibration
Online Metrics
- Click-through rate
- Conversion rate
- Watch time
- Revenue per session
- Retention
- User satisfaction
- Long-term engagement
LLM-specific Metrics
- Explanation quality
- Constraint satisfaction
- Hallucination rate
- User trust
👉 面试回答
我会用 offline 和 online metrics 同时评估 recommendation systems。
Offline metrics 包括 precision、recall、 NDCG、diversity 和 coverage。
Online metrics 包括 CTR、conversion、 retention、satisfaction 和 long-term engagement。
1️⃣8️⃣ Common Failure Modes
Failure Modes
AI recommendation systems 可能失败因为:
- Filter bubbles
- Popularity bias
- Bad cold-start handling
- Stale preferences
- Over-personalization
- Hallucinated explanations
- Irrelevant recommendations
- Privacy leakage
- Ignoring business constraints
Example
LLM explains that a product matches user preference,
but the product is out of stock.
👉 面试回答
Recommendation systems 可能因为 overfit past behavior、 ignore freshness、放大 popularity bias、 违反 constraints 或产生 hallucinated explanations 而失败。
LLM-generated explanations 必须 grounded in real item attributes。
1️⃣9️⃣ Cost Control
Cost Drivers
- LLM re-ranking
- Embedding generation
- Vector search
- Feature computation
- Real-time personalization
- Frequent model updates
- Long prompts
Controls
- Use LLM only on top candidates
- Cache item embeddings
- Cache recommendation results
- Precompute candidates
- Use smaller models for explanations
- Limit prompt size
- Batch offline processing
👉 面试回答
Cost control 很重要, 因为 LLM-based recommendation 可能很昂贵。
我会先使用 traditional retrieval 和 ranking, 只在 small candidate sets 或 explanation generation 中使用 LLMs。
2️⃣0️⃣ Best Practices
Practical Rules
- Do not use LLMs to rank the full catalog
- Use candidate generation first
- Use ranking model before LLM re-ranking
- Use LLMs for intent, explanations, and small-set reasoning
- Ground explanations in item metadata
- Collect explicit and implicit feedback
- Add safety and business filters
- Evaluate with online experiments
- Monitor bias and diversity
- Control cost with caching
Design Principle
Use traditional recommenders for scale.
Use LLMs for reasoning,
conversation,
and explanation.
👉 面试回答
Strong LLM recommendation system 结合 traditional recommender infrastructure 和 LLM reasoning。
Candidate generation 和 ranking 负责 scale。
LLMs 帮助 intent understanding、 conversational clarification、small-set re-ranking、 cold start 和 explanations。
🧠 Staff-Level Answer Final
👉 面试回答完整版本
设计 AI recommendation system with LLMs, 我不会把 LLM 当成整个 recommendation engine。
Production recommender 仍然需要 scalable candidate generation、 ranking、feature stores、user profiles、 item catalogs、feedback loops 和 business-rule filters。
系统首先从 explicit preferences、click history、 purchases、searches、ratings 和 recent session behavior 构建 user profiles。
Items 使用 structured metadata、 behavioral signals、tags、text descriptions、 images、embeddings、freshness、 availability 和 popularity 表示。
Candidate generation 通过 collaborative filtering、 content-based retrieval、vector search、 popularity、trending signals、 graph relationships 和 business campaigns 从大 catalog 中检索 manageable set。
Ranking model 再使用 user features、 item features、context、behavioral signals 和 business goals 给 candidates 打分。
LLMs 最适合放在 scalable retrieval 和 ranking pipeline 之后。
它们可以帮助 understand natural-language intent、 enrich item metadata、解决 cold-start problems、 ask clarifying questions、 re-rank small candidate set 和 generate recommendation explanations。
LLMs 不应该直接 rank millions of items, 因为太慢且太贵。
对 conversational recommendations, LLM 可以把 vague user preferences 转换成 structured constraints, 询问 follow-up questions, 并解释 trade-offs。
对 cold-start users 或 new items, LLM 可以在缺少 interaction data 时, 使用 natural-language preferences 和 item descriptions。
Feedback loop 非常关键。
系统应该收集 impressions、clicks、skips、 purchases、dwell time、ratings 和 user feedback, 然后更新 features 并 retrain ranking models。
Safety 和 fairness filters 应该执行 privacy、availability、 age restrictions、legal constraints、 diversity 和 business rules。
Evaluation 应包含 offline metrics, 比如 precision@K、recall@K、NDCG、 diversity 和 coverage, 也包含 online metrics, 比如 CTR、conversion、retention、 satisfaction 和 long-term engagement。
核心原则是: use traditional recommenders for scale, and use LLMs for reasoning、 conversation 和 explanation。
⭐ Final Insight
LLM Recommendation System 的核心不是:
“把用户信息和所有商品丢给 LLM”
真正的系统是:
User Modeling
- Item Modeling
- Candidate Generation
- Ranking
- LLM Re-ranking
- Conversational Clarification
- Explanation Generation
- Feedback Loop
- Safety Filters
- Evaluation。
传统 recommender 负责 scale。
LLM 负责 reasoning、conversation 和 explanation。
最重要的一句话:
Use traditional recommenders for scale.
Use LLMs for reasoning, conversation, and explanation.
Implement