🎯 Design an AI Search Engine like Perplexity
1️⃣ Core Framework
When designing an AI Search Engine, I frame it as:
- Product requirements
- Query understanding
- Search and retrieval
- Source ranking
- Answer generation
- Citation and grounding
- Freshness and crawling
- Trade-offs: accuracy vs latency vs cost
2️⃣ Product Goal
An AI search engine answers user questions using retrieved web or knowledge sources.
Unlike traditional search, it does not only return links.
It returns:
- Direct answer
- Source citations
- Relevant links
- Follow-up questions
- Search context
- Optional summaries
Basic Flow
User Query
→ Understand Intent
→ Retrieve Sources
→ Rank Sources
→ Build Context
→ Generate Answer
→ Cite Sources
👉 Interview Answer
An AI search engine combines search retrieval with LLM generation.
It retrieves relevant sources, ranks them, builds grounded context, generates a direct answer, and provides citations so users can verify the result.
3️⃣ Functional Requirements
Core Features
The system should support:
- User search query
- Web retrieval
- Source ranking
- AI-generated answer
- Citations
- Follow-up questions
- Search result links
- Query history
- Related queries
Advanced Features
- Real-time news search
- Academic search
- Image search
- Video search
- File search
- Personal knowledge search
- Multi-turn search conversation
- Search filters by date or source
👉 Interview Answer
The core requirements are query understanding, source retrieval, ranking, grounded answer generation, citations, and follow-up questions.
Advanced features include real-time search, vertical search, file search, personal search, and multi-turn search refinement.
4️⃣ Non-functional Requirements
Important System Qualities
The system should optimize for:
- Answer accuracy
- Source quality
- Low latency
- Freshness
- High availability
- Scalability
- Safety
- Cost efficiency
- Citation reliability
Key Trade-off
More sources and stronger models
→ Better answer quality
But also
→ Higher latency and cost
👉 Interview Answer
Non-functional requirements include accuracy, freshness, source quality, low latency, scalability, reliability, safety, and cost efficiency.
The core trade-off is answer quality versus latency and cost.
5️⃣ High-Level Architecture
Architecture
Client
→ API Gateway
→ Search Service
→ Query Understanding
→ Web Search / Index Retrieval
→ Source Ranker
→ Content Fetcher
→ Context Builder
→ LLM Generator
→ Citation Builder
→ Response Streamer
Core Components
Query Understanding
Classifies intent and rewrites the query.
Retrieval Layer
Searches web index, news index, documents, or external APIs.
Source Ranker
Ranks sources by relevance, freshness, authority, and diversity.
Context Builder
Selects useful snippets for the LLM.
Citation Builder
Links answer claims to sources.
👉 Interview Answer
A Perplexity-style system includes query understanding, retrieval, source ranking, content fetching, context building, LLM generation, citation generation, streaming, safety, and observability.
6️⃣ Query Understanding
Why Query Understanding Matters
User queries are often vague.
The system needs to understand:
- Intent
- Topic
- Required freshness
- Search vertical
- Expected answer type
- Ambiguity
- Whether browsing is needed
Example
Query:
"latest Apple earnings"
Intent:
financial/news search
Freshness:
recent
Source preference:
financial reports and reputable news
👉 Interview Answer
Query understanding decides what the user is asking, how fresh the answer needs to be, which retrieval sources to use, and whether the query should be rewritten or clarified.
7️⃣ Query Rewriting
Why Rewrite Queries?
User queries may not be optimal for search.
Examples
User query:
"openai latest model"
Rewritten queries:
"OpenAI latest model announcement 2026"
"OpenAI API latest model release"
Benefits
- Better retrieval recall
- Better source diversity
- Better exact matching
- Better freshness targeting
👉 Interview Answer
Query rewriting improves retrieval by converting a user question into search-friendly queries.
The system may generate multiple query variants to improve recall, freshness, and source diversity.
8️⃣ Retrieval Layer
Retrieval Sources
The system may retrieve from:
- Web index
- News index
- Academic index
- Internal knowledge base
- User files
- APIs
- Structured databases
Retrieval Flow
Query
→ Search Index
→ Candidate Sources
→ Fetch Snippets
→ Rank Results
Hybrid Retrieval
A strong system often combines:
- Keyword search
- Vector search
- Metadata filters
- Freshness filters
- Domain authority signals
👉 Interview Answer
The retrieval layer finds candidate sources.
A production AI search engine usually combines keyword search, vector search, metadata filters, freshness signals, and source authority signals.
9️⃣ Source Ranking
Why Ranking Matters
Not all sources are equal.
The system should rank by:
- Relevance
- Freshness
- Authority
- Trustworthiness
- Diversity
- Originality
- Accessibility
- User intent match
Example
For company financial results:
1. Official investor relations page
2. SEC filing
3. Reputable financial news
4. Blog summaries
👉 Interview Answer
Source ranking determines which documents should be used for answer generation.
The system should prefer relevant, fresh, authoritative, trustworthy, and diverse sources.
🔟 Content Fetching and Extraction
Why Fetch Full Content?
Search snippets may not contain enough context.
Content Extraction Steps
URL
→ Fetch page
→ Parse HTML
→ Remove boilerplate
→ Extract main content
→ Split into chunks
Challenges
- Paywalls
- Dynamic pages
- Duplicate content
- Ads and navigation
- Broken pages
- PDF parsing
- Rate limits
👉 Interview Answer
After retrieving candidate URLs, the system may fetch full pages, extract main content, remove boilerplate, chunk the text, and pass only relevant parts to the LLM.
1️⃣1️⃣ Context Builder
What Context Builder Does
The context builder selects the best evidence for the model.
It must decide:
- Which sources to include
- Which snippets to include
- How many tokens to allocate
- How to preserve source metadata
- How to avoid duplicate evidence
Context Format
Source 1:
Title
URL
Relevant excerpt
Source 2:
Title
URL
Relevant excerpt
👉 Interview Answer
The context builder converts ranked sources into a compact evidence package for the LLM.
It selects relevant snippets, preserves source metadata, removes duplicates, and keeps the prompt within token limits.
1️⃣2️⃣ Answer Generation
Generation Step
The LLM receives:
- User question
- Retrieved context
- Citation metadata
- Answer instructions
- Safety rules
- Output format
Good Answer Should
- Directly answer the question
- Use retrieved sources
- Cite claims
- Mention uncertainty
- Avoid unsupported facts
- Explain briefly and clearly
Important Rule
If sources do not support the answer,
say the evidence is insufficient.
👉 Interview Answer
The LLM should generate answers grounded in retrieved sources.
It should cite supporting evidence, avoid unsupported claims, and clearly state when the available sources are insufficient.
1️⃣3️⃣ Citation System
Why Citations Matter
Citations let users verify answers.
They also improve trust.
Citation Builder Responsibilities
- Link answer claims to source snippets
- Preserve URL and title
- Avoid fake citations
- Cite the strongest source
- Support multiple sources per claim
- Handle conflicting evidence
Bad Citation
Answer cites a source that does not support the claim.
Good Citation
Claim
→ Supported by retrieved source excerpt
→ Citation shown to user
👉 Interview Answer
Citations are critical for AI search engines.
The system should map answer claims to retrieved source evidence and avoid citing sources that do not actually support the claim.
1️⃣4️⃣ Freshness
Why Freshness Matters
Search queries often require current information.
Examples:
- News
- Sports
- Stock prices
- Product launches
- Laws
- Weather
- Recent papers
- Company announcements
Freshness Strategy
Query freshness classifier
→ If recent needed:
use news index / live web / APIs
→ Else:
use general index
👉 Interview Answer
Freshness is a key design requirement.
The system should detect whether a query needs recent information and route it to live search, news indexes, or APIs when freshness matters.
1️⃣5️⃣ Handling Conflicting Sources
Problem
Different sources may disagree.
Example
Source A says product launches in June.
Source B says product launches in July.
Strategy
- Prefer primary sources
- Compare dates
- Show uncertainty
- Cite both sides
- Avoid overconfident answer
👉 Interview Answer
AI search systems must handle conflicting evidence.
The answer should prefer primary and recent sources, mention disagreement when relevant, and avoid presenting uncertain claims as facts.
1️⃣6️⃣ Multi-turn Search
Why Multi-turn Matters
Users may refine their search.
Example
User: Best laptops for AI development
Assistant: Gives options
User: Only under $1500
Assistant: Refines search
State Needed
- Previous query
- Retrieved sources
- User constraints
- Current answer context
- Follow-up intent
👉 Interview Answer
Multi-turn AI search requires conversation state.
The system should track previous queries, constraints, retrieved sources, and user intent so follow-up searches can refine the answer.
1️⃣7️⃣ Safety and Abuse Prevention
Risks
AI search systems may surface:
- Misinformation
- Harmful instructions
- Unsafe advice
- Copyright-sensitive content
- Private data
- Malicious web pages
- Prompt injection from web content
Controls
- Source filtering
- Input moderation
- Output moderation
- Prompt injection defense
- Safe browsing filters
- Citation requirements
- High-risk query handling
👉 Interview Answer
AI search engines need safety controls for harmful content, misinformation, malicious pages, prompt injection, unsafe advice, and privacy risks.
Retrieved web content should be treated as untrusted input.
1️⃣8️⃣ Cost Control
Cost Drivers
- Web search calls
- Page fetching
- Content extraction
- Embeddings
- Re-ranking
- LLM generation
- Long context
- Frequent refresh
Controls
- Query caching
- Retrieval caching
- Source deduplication
- Smaller model routing
- Token budget limits
- Re-rank only top candidates
- Cache fresh results with TTL
👉 Interview Answer
Cost control is important because AI search may call search APIs, fetch pages, run ranking, build long contexts, and call large models.
Caching, model routing, token budgets, and candidate limits help control cost.
1️⃣9️⃣ Observability
What to Monitor
- Query latency
- Retrieval latency
- Fetch failures
- Source quality
- Citation accuracy
- Answer quality
- Freshness
- User feedback
- Cost per query
- Cache hit rate
- Safety blocks
Debugging Questions
- Which sources were retrieved?
- Why were they ranked?
- Which snippets were sent to the model?
- Which claim used which citation?
- Was the answer grounded?
👉 Interview Answer
Observability should trace retrieval, ranking, content fetching, context building, generation, citations, latency, cost, and user feedback.
Without this, AI search quality is hard to debug.
2️⃣0️⃣ Best Practices
Practical Rules
- Use query understanding before retrieval
- Generate multiple search queries when useful
- Prefer authoritative sources
- Preserve citation metadata
- Keep context focused
- Detect freshness requirements
- Treat web content as untrusted
- Handle conflicting sources explicitly
- Cache carefully with TTL
- Evaluate citation correctness
Design Principle
AI search quality depends on retrieval,
ranking,
grounding,
and citations,
not just the LLM.
👉 Interview Answer
A production AI search engine should combine strong retrieval, source ranking, context building, grounded generation, citation validation, freshness detection, safety, caching, and observability.
🧠 Staff-Level Answer Final
👉 Interview Answer Full Version
To design an AI search engine like Perplexity, I would treat it as a search and retrieval product with an LLM generation layer, not just a chatbot.
The system starts with the user query.
A query understanding service classifies the intent, freshness requirement, ambiguity, search vertical, and expected answer type.
It may rewrite the query into multiple search-friendly variants to improve retrieval recall and source diversity.
The retrieval layer searches across web indexes, news indexes, academic sources, internal knowledge bases, user files, or external APIs.
Candidate sources are ranked by relevance, freshness, authority, trustworthiness, diversity, and user intent match.
For important queries, the system may fetch full pages, extract main content, remove boilerplate, chunk the text, and keep only the most relevant snippets.
The context builder then creates a compact evidence package for the LLM.
It preserves source titles, URLs, snippets, timestamps, and citation metadata.
The LLM generates an answer grounded in that evidence.
It should cite sources, avoid unsupported claims, handle conflicting evidence, and say when the evidence is insufficient.
Citations are a core product feature.
The citation system should map claims to actual supporting source snippets, not just attach random URLs.
Freshness is also critical.
The system should detect when a query requires recent information and use live search, news indexes, or APIs instead of relying only on stale indexes.
Multi-turn search requires state: previous queries, user constraints, retrieved sources, and follow-up intent.
Safety is important because retrieved web content is untrusted and may contain misinformation, harmful content, or prompt injection.
Cost control requires caching, source deduplication, model routing, token budgeting, and limiting expensive re-ranking.
Observability should trace query rewriting, retrieved sources, ranking decisions, context sent to the model, citations, latency, cost, and user feedback.
The key principle is that AI search quality depends on retrieval, ranking, grounding, and citations, not just the LLM.
⭐ Final Insight
Perplexity-style AI Search 的核心不是:
“Search + LLM 总结”
而是:
Query Understanding
- Query Rewriting
- Retrieval
- Source Ranking
- Content Extraction
- Context Building
- Grounded Generation
- Citation Mapping
- Freshness Detection
- Safety
- Observability。
最重要的一句话:
AI search quality depends on retrieval, ranking, grounding, and citations, not just the LLM.
中文部分
🎯 Design an AI Search Engine like Perplexity
1️⃣ 核心框架
设计 AI Search Engine 时,我通常从这些方面分析:
- Product requirements
- Query understanding
- Search and retrieval
- Source ranking
- Answer generation
- Citation and grounding
- Freshness and crawling
- 核心权衡:accuracy vs latency vs cost
2️⃣ Product Goal
AI search engine 使用 retrieved web 或 knowledge sources 回答用户问题。
和传统 search 不同, 它不只是返回 links。
它会返回:
- Direct answer
- Source citations
- Relevant links
- Follow-up questions
- Search context
- Optional summaries
Basic Flow
User Query
→ Understand Intent
→ Retrieve Sources
→ Rank Sources
→ Build Context
→ Generate Answer
→ Cite Sources
👉 面试回答
AI search engine 结合 search retrieval 和 LLM generation。
它检索相关 sources, 对 sources 排序, 构建 grounded context, 生成直接答案, 并提供 citations 让用户可以验证结果。
3️⃣ Functional Requirements
Core Features
系统应该支持:
- User search query
- Web retrieval
- Source ranking
- AI-generated answer
- Citations
- Follow-up questions
- Search result links
- Query history
- Related queries
Advanced Features
- Real-time news search
- Academic search
- Image search
- Video search
- File search
- Personal knowledge search
- Multi-turn search conversation
- Search filters by date or source
👉 面试回答
核心需求包括 query understanding、 source retrieval、ranking、 grounded answer generation、citations 和 follow-up questions。
Advanced features 包括 real-time search、 vertical search、file search、 personal search 和 multi-turn search refinement。
4️⃣ Non-functional Requirements
Important System Qualities
系统应该优化:
- Answer accuracy
- Source quality
- Low latency
- Freshness
- High availability
- Scalability
- Safety
- Cost efficiency
- Citation reliability
Key Trade-off
More sources and stronger models
→ Better answer quality
But also
→ Higher latency and cost
👉 面试回答
Non-functional requirements 包括 accuracy、 freshness、source quality、low latency、 scalability、reliability、safety 和 cost efficiency。
核心权衡是 answer quality 和 latency / cost。
5️⃣ High-Level Architecture
Architecture
Client
→ API Gateway
→ Search Service
→ Query Understanding
→ Web Search / Index Retrieval
→ Source Ranker
→ Content Fetcher
→ Context Builder
→ LLM Generator
→ Citation Builder
→ Response Streamer
Core Components
Query Understanding
分类 intent 并 rewrite query。
Retrieval Layer
搜索 web index、news index、 documents 或 external APIs。
Source Ranker
根据 relevance、freshness、 authority 和 diversity 排序 sources。
Context Builder
为 LLM 选择有用 snippets。
Citation Builder
把 answer claims 连接到 sources。
👉 面试回答
Perplexity-style system 包括 query understanding、 retrieval、source ranking、content fetching、 context building、LLM generation、 citation generation、streaming、 safety 和 observability。
6️⃣ Query Understanding
为什么 Query Understanding 重要?
User queries 经常很模糊。
系统需要理解:
- Intent
- Topic
- Required freshness
- Search vertical
- Expected answer type
- Ambiguity
- Whether browsing is needed
Example
Query:
"latest Apple earnings"
Intent:
financial/news search
Freshness:
recent
Source preference:
financial reports and reputable news
👉 面试回答
Query understanding 决定用户在问什么、 答案需要多新、 应该使用哪些 retrieval sources, 以及 query 是否需要 rewrite 或 clarify。
7️⃣ Query Rewriting
为什么 Rewrite Queries?
User queries 可能不适合 search。
Examples
User query:
"openai latest model"
Rewritten queries:
"OpenAI latest model announcement 2026"
"OpenAI API latest model release"
Benefits
- Better retrieval recall
- Better source diversity
- Better exact matching
- Better freshness targeting
👉 面试回答
Query rewriting 通过把 user question 转换成 search-friendly queries 来改善 retrieval。
系统可以生成多个 query variants, 提升 recall、freshness 和 source diversity。
8️⃣ Retrieval Layer
Retrieval Sources
系统可以从这些地方 retrieve:
- Web index
- News index
- Academic index
- Internal knowledge base
- User files
- APIs
- Structured databases
Retrieval Flow
Query
→ Search Index
→ Candidate Sources
→ Fetch Snippets
→ Rank Results
Hybrid Retrieval
强系统通常结合:
- Keyword search
- Vector search
- Metadata filters
- Freshness filters
- Domain authority signals
👉 面试回答
Retrieval layer 负责找到 candidate sources。
Production AI search engine 通常结合 keyword search、vector search、 metadata filters、freshness signals 和 source authority signals。
9️⃣ Source Ranking
为什么 Ranking 重要?
不是所有 sources 都一样。
系统应该根据以下排序:
- Relevance
- Freshness
- Authority
- Trustworthiness
- Diversity
- Originality
- Accessibility
- User intent match
Example
For company financial results:
1. Official investor relations page
2. SEC filing
3. Reputable financial news
4. Blog summaries
👉 面试回答
Source ranking 决定哪些 documents 应该用于 answer generation。
系统应该优先使用 relevant、fresh、 authoritative、trustworthy 和 diverse sources。
🔟 Content Fetching and Extraction
为什么要 Fetch Full Content?
Search snippets 可能没有足够 context。
Content Extraction Steps
URL
→ Fetch page
→ Parse HTML
→ Remove boilerplate
→ Extract main content
→ Split into chunks
Challenges
- Paywalls
- Dynamic pages
- Duplicate content
- Ads and navigation
- Broken pages
- PDF parsing
- Rate limits
👉 面试回答
检索到 candidate URLs 后, 系统可能需要 fetch full pages、 extract main content、remove boilerplate、 chunk text, 然后只把 relevant parts 提供给 LLM。
1️⃣1️⃣ Context Builder
Context Builder 做什么?
Context builder 选择最好的 evidence 给 model。
它需要决定:
- Which sources to include
- Which snippets to include
- How many tokens to allocate
- How to preserve source metadata
- How to avoid duplicate evidence
Context Format
Source 1:
Title
URL
Relevant excerpt
Source 2:
Title
URL
Relevant excerpt
👉 面试回答
Context builder 把 ranked sources 转换成 compact evidence package 给 LLM。
它选择 relevant snippets, 保留 source metadata, 去重, 并控制 prompt 在 token limits 内。
1️⃣2️⃣ Answer Generation
Generation Step
LLM 接收:
- User question
- Retrieved context
- Citation metadata
- Answer instructions
- Safety rules
- Output format
Good Answer Should
- Directly answer the question
- Use retrieved sources
- Cite claims
- Mention uncertainty
- Avoid unsupported facts
- Explain briefly and clearly
Important Rule
If sources do not support the answer,
say the evidence is insufficient.
👉 面试回答
LLM 应该生成基于 retrieved sources 的 grounded answer。
它应该引用 supporting evidence, 避免 unsupported claims, 并在 sources 不足时明确说明。
1️⃣3️⃣ Citation System
为什么 Citations 重要?
Citations 让用户可以验证答案。
也提升 trust。
Citation Builder Responsibilities
- Link answer claims to source snippets
- Preserve URL and title
- Avoid fake citations
- Cite the strongest source
- Support multiple sources per claim
- Handle conflicting evidence
Bad Citation
Answer cites a source that does not support the claim.
Good Citation
Claim
→ Supported by retrieved source excerpt
→ Citation shown to user
👉 面试回答
Citations 对 AI search engines 很关键。
系统应该把 answer claims 映射到 retrieved source evidence, 并避免引用并不支持 claim 的 sources。
1️⃣4️⃣ Freshness
为什么 Freshness 重要?
Search queries 经常需要 current information。
Examples:
- News
- Sports
- Stock prices
- Product launches
- Laws
- Weather
- Recent papers
- Company announcements
Freshness Strategy
Query freshness classifier
→ If recent needed:
use news index / live web / APIs
→ Else:
use general index
👉 面试回答
Freshness 是核心设计需求。
系统应该检测 query 是否需要 recent information, 当 freshness 重要时, route 到 live search、news indexes 或 APIs。
1️⃣5️⃣ Handling Conflicting Sources
Problem
不同 sources 可能 disagree。
Example
Source A says product launches in June.
Source B says product launches in July.
Strategy
- Prefer primary sources
- Compare dates
- Show uncertainty
- Cite both sides
- Avoid overconfident answer
👉 面试回答
AI search systems 必须处理 conflicting evidence。
Answer 应该优先使用 primary 和 recent sources, 在必要时说明 disagreement, 并避免把 uncertain claims 说成事实。
1️⃣6️⃣ Multi-turn Search
为什么 Multi-turn 重要?
Users 会 refine search。
Example
User: Best laptops for AI development
Assistant: Gives options
User: Only under $1500
Assistant: Refines search
State Needed
- Previous query
- Retrieved sources
- User constraints
- Current answer context
- Follow-up intent
👉 面试回答
Multi-turn AI search 需要 conversation state。
系统应该追踪 previous queries、constraints、 retrieved sources 和 user intent, 这样 follow-up searches 才能 refine answer。
1️⃣7️⃣ Safety and Abuse Prevention
Risks
AI search systems 可能暴露:
- Misinformation
- Harmful instructions
- Unsafe advice
- Copyright-sensitive content
- Private data
- Malicious web pages
- Prompt injection from web content
Controls
- Source filtering
- Input moderation
- Output moderation
- Prompt injection defense
- Safe browsing filters
- Citation requirements
- High-risk query handling
👉 面试回答
AI search engines 需要 safety controls, 处理 harmful content、misinformation、 malicious pages、prompt injection、 unsafe advice 和 privacy risks。
Retrieved web content 应被视为 untrusted input。
1️⃣8️⃣ Cost Control
Cost Drivers
- Web search calls
- Page fetching
- Content extraction
- Embeddings
- Re-ranking
- LLM generation
- Long context
- Frequent refresh
Controls
- Query caching
- Retrieval caching
- Source deduplication
- Smaller model routing
- Token budget limits
- Re-rank only top candidates
- Cache fresh results with TTL
👉 面试回答
Cost control 很重要, 因为 AI search 可能调用 search APIs、 fetch pages、运行 ranking、 构建 long contexts 并调用 large models。
Caching、model routing、token budgets 和 candidate limits 可以控制 cost。
1️⃣9️⃣ Observability
What to Monitor
- Query latency
- Retrieval latency
- Fetch failures
- Source quality
- Citation accuracy
- Answer quality
- Freshness
- User feedback
- Cost per query
- Cache hit rate
- Safety blocks
Debugging Questions
- 哪些 sources 被 retrieved?
- 为什么它们 rank 高?
- 哪些 snippets 被发送给 model?
- 哪个 claim 使用了哪个 citation?
- Answer 是否 grounded?
👉 面试回答
Observability 应该 trace retrieval、ranking、 content fetching、context building、 generation、citations、latency、cost 和 user feedback。
没有这些, AI search quality 很难 debug。
2️⃣0️⃣ Best Practices
Practical Rules
- Use query understanding before retrieval
- Generate multiple search queries when useful
- Prefer authoritative sources
- Preserve citation metadata
- Keep context focused
- Detect freshness requirements
- Treat web content as untrusted
- Handle conflicting sources explicitly
- Cache carefully with TTL
- Evaluate citation correctness
Design Principle
AI search quality depends on retrieval,
ranking,
grounding,
and citations,
not just the LLM.
👉 面试回答
Production AI search engine 应结合 strong retrieval、source ranking、 context building、grounded generation、 citation validation、freshness detection、 safety、caching 和 observability。
🧠 Staff-Level Answer Final
👉 面试回答完整版本
设计 Perplexity-style AI search engine, 我会把它看作 search and retrieval product 加上 LLM generation layer, 而不是普通 chatbot。
系统从 user query 开始。
Query understanding service 会分类 intent、freshness requirement、 ambiguity、search vertical 和 expected answer type。
它可能把 query rewrite 成多个 search-friendly variants, 提升 retrieval recall 和 source diversity。
Retrieval layer 会从 web indexes、 news indexes、academic sources、 internal knowledge bases、user files 或 external APIs 中搜索。
Candidate sources 根据 relevance、freshness、 authority、trustworthiness、diversity 和 user intent match 进行 ranking。
对重要 queries, 系统可能 fetch full pages、 extract main content、remove boilerplate、 chunk text, 并只保留最 relevant snippets。
Context builder 会为 LLM 创建 compact evidence package。
它保留 source titles、URLs、snippets、 timestamps 和 citation metadata。
LLM 基于这些 evidence 生成答案。
它应该 cite sources、 避免 unsupported claims、 处理 conflicting evidence, 并在 evidence insufficient 时说明。
Citations 是核心 product feature。
Citation system 应该把 claims 映射到 actual supporting source snippets, 而不是随机附 URLs。
Freshness 也很关键。
系统应该检测 query 是否需要 recent information, 并使用 live search、news indexes 或 APIs, 而不是只依赖 stale indexes。
Multi-turn search 需要 state: previous queries、user constraints、 retrieved sources 和 follow-up intent。
Safety 很重要, 因为 retrieved web content 是 untrusted, 可能包含 misinformation、harmful content 或 prompt injection。
Cost control 需要 caching、source deduplication、 model routing、token budgeting 和限制 expensive re-ranking。
Observability 应该追踪 query rewriting、 retrieved sources、ranking decisions、 context sent to model、citations、 latency、cost 和 user feedback。
核心原则是: AI search quality depends on retrieval、 ranking、grounding 和 citations, not just the LLM。
⭐ Final Insight
Perplexity-style AI Search 的核心不是:
“Search + LLM 总结”
而是:
Query Understanding
- Query Rewriting
- Retrieval
- Source Ranking
- Content Extraction
- Context Building
- Grounded Generation
- Citation Mapping
- Freshness Detection
- Safety
- Observability。
最重要的一句话:
AI search quality depends on retrieval, ranking, grounding, and citations, not just the LLM.
Implement