aaa-psd AI Product & System Design ·

🎯 Design an AI Search Engine like Perplexity

1️⃣ Core Framework

When designing an AI Search Engine, I frame it as:

Product requirements
Query understanding
Search and retrieval
Source ranking
Answer generation
Citation and grounding
Freshness and crawling
Trade-offs: accuracy vs latency vs cost

2️⃣ Product Goal

An AI search engine answers user questions using retrieved web or knowledge sources.

Unlike traditional search, it does not only return links.

It returns:

Direct answer
Source citations
Relevant links
Follow-up questions
Search context
Optional summaries

Basic Flow

User Query
→ Understand Intent
→ Retrieve Sources
→ Rank Sources
→ Build Context
→ Generate Answer
→ Cite Sources

👉 Interview Answer

An AI search engine combines search retrieval with LLM generation.

It retrieves relevant sources, ranks them, builds grounded context, generates a direct answer, and provides citations so users can verify the result.

3️⃣ Functional Requirements

Core Features

The system should support:

User search query
Web retrieval
Source ranking
AI-generated answer
Citations
Follow-up questions
Search result links
Query history
Related queries

Advanced Features

Real-time news search
Academic search
Image search
Video search
File search
Personal knowledge search
Multi-turn search conversation
Search filters by date or source

👉 Interview Answer

The core requirements are query understanding, source retrieval, ranking, grounded answer generation, citations, and follow-up questions.

Advanced features include real-time search, vertical search, file search, personal search, and multi-turn search refinement.

4️⃣ Non-functional Requirements

Important System Qualities

The system should optimize for:

Answer accuracy
Source quality
Low latency
Freshness
High availability
Scalability
Safety
Cost efficiency
Citation reliability

Key Trade-off

More sources and stronger models
→ Better answer quality

But also
→ Higher latency and cost

👉 Interview Answer

Non-functional requirements include accuracy, freshness, source quality, low latency, scalability, reliability, safety, and cost efficiency.

The core trade-off is answer quality versus latency and cost.

5️⃣ High-Level Architecture

Architecture

Client
→ API Gateway
→ Search Service
→ Query Understanding
→ Web Search / Index Retrieval
→ Source Ranker
→ Content Fetcher
→ Context Builder
→ LLM Generator
→ Citation Builder
→ Response Streamer

Core Components

Query Understanding

Classifies intent and rewrites the query.

Retrieval Layer

Searches web index, news index, documents, or external APIs.

Source Ranker

Ranks sources by relevance, freshness, authority, and diversity.

Context Builder

Selects useful snippets for the LLM.

Citation Builder

Links answer claims to sources.

👉 Interview Answer

A Perplexity-style system includes query understanding, retrieval, source ranking, content fetching, context building, LLM generation, citation generation, streaming, safety, and observability.

6️⃣ Query Understanding

Why Query Understanding Matters

User queries are often vague.

The system needs to understand:

Intent
Topic
Required freshness
Search vertical
Expected answer type
Ambiguity
Whether browsing is needed

Example

Query:
"latest Apple earnings"

Intent:
financial/news search

Freshness:
recent

Source preference:
financial reports and reputable news

👉 Interview Answer

Query understanding decides what the user is asking, how fresh the answer needs to be, which retrieval sources to use, and whether the query should be rewritten or clarified.

7️⃣ Query Rewriting

Why Rewrite Queries?

User queries may not be optimal for search.

Examples

User query:
"openai latest model"

Rewritten queries:
"OpenAI latest model announcement 2026"
"OpenAI API latest model release"

Benefits

Better retrieval recall
Better source diversity
Better exact matching
Better freshness targeting

👉 Interview Answer

Query rewriting improves retrieval by converting a user question into search-friendly queries.

The system may generate multiple query variants to improve recall, freshness, and source diversity.

8️⃣ Retrieval Layer

Retrieval Sources

The system may retrieve from:

Web index
News index
Academic index
Internal knowledge base
User files
APIs
Structured databases

Retrieval Flow

Query
→ Search Index
→ Candidate Sources
→ Fetch Snippets
→ Rank Results

Hybrid Retrieval

A strong system often combines:

Keyword search
Vector search
Metadata filters
Freshness filters
Domain authority signals

👉 Interview Answer

The retrieval layer finds candidate sources.

A production AI search engine usually combines keyword search, vector search, metadata filters, freshness signals, and source authority signals.

9️⃣ Source Ranking

Why Ranking Matters

Not all sources are equal.

The system should rank by:

Relevance
Freshness
Authority
Trustworthiness
Diversity
Originality
Accessibility
User intent match

Example

For company financial results:
Official investor relations page
SEC filing
Reputable financial news
Blog summaries

👉 Interview Answer

Source ranking determines which documents should be used for answer generation.

The system should prefer relevant, fresh, authoritative, trustworthy, and diverse sources.

🔟 Content Fetching and Extraction

Why Fetch Full Content?

Search snippets may not contain enough context.

Content Extraction Steps

URL
→ Fetch page
→ Parse HTML
→ Remove boilerplate
→ Extract main content
→ Split into chunks

Challenges

Paywalls
Dynamic pages
Duplicate content
Ads and navigation
Broken pages
PDF parsing
Rate limits

👉 Interview Answer

After retrieving candidate URLs, the system may fetch full pages, extract main content, remove boilerplate, chunk the text, and pass only relevant parts to the LLM.

1️⃣1️⃣ Context Builder

What Context Builder Does

The context builder selects the best evidence for the model.

It must decide:

Which sources to include
Which snippets to include
How many tokens to allocate
How to preserve source metadata
How to avoid duplicate evidence

Context Format

Source 1:
Title
URL
Relevant excerpt

Source 2:
Title
URL
Relevant excerpt

👉 Interview Answer

The context builder converts ranked sources into a compact evidence package for the LLM.

It selects relevant snippets, preserves source metadata, removes duplicates, and keeps the prompt within token limits.

1️⃣2️⃣ Answer Generation

Generation Step

The LLM receives:

User question
Retrieved context
Citation metadata
Answer instructions
Safety rules
Output format

Good Answer Should

Directly answer the question
Use retrieved sources
Cite claims
Mention uncertainty
Avoid unsupported facts
Explain briefly and clearly

Important Rule

If sources do not support the answer,
say the evidence is insufficient.

👉 Interview Answer

The LLM should generate answers grounded in retrieved sources.

It should cite supporting evidence, avoid unsupported claims, and clearly state when the available sources are insufficient.

1️⃣3️⃣ Citation System

Why Citations Matter

Citations let users verify answers.

They also improve trust.

Citation Builder Responsibilities

Link answer claims to source snippets
Preserve URL and title
Avoid fake citations
Cite the strongest source
Support multiple sources per claim
Handle conflicting evidence

Bad Citation

Answer cites a source that does not support the claim.

Good Citation

Claim
→ Supported by retrieved source excerpt
→ Citation shown to user

👉 Interview Answer

Citations are critical for AI search engines.

The system should map answer claims to retrieved source evidence and avoid citing sources that do not actually support the claim.

1️⃣4️⃣ Freshness

Why Freshness Matters

Search queries often require current information.

Examples:

News
Sports
Stock prices
Product launches
Laws
Weather
Recent papers
Company announcements

Freshness Strategy

Query freshness classifier
→ If recent needed:
   use news index / live web / APIs
→ Else:
   use general index

👉 Interview Answer

Freshness is a key design requirement.

The system should detect whether a query needs recent information and route it to live search, news indexes, or APIs when freshness matters.

1️⃣5️⃣ Handling Conflicting Sources

Problem

Different sources may disagree.

Example

Source A says product launches in June.
Source B says product launches in July.

Strategy

Prefer primary sources
Compare dates
Show uncertainty
Cite both sides
Avoid overconfident answer

👉 Interview Answer

AI search systems must handle conflicting evidence.

The answer should prefer primary and recent sources, mention disagreement when relevant, and avoid presenting uncertain claims as facts.

1️⃣6️⃣ Multi-turn Search

Why Multi-turn Matters

Users may refine their search.

Example

User: Best laptops for AI development
Assistant: Gives options
User: Only under $1500
Assistant: Refines search

State Needed

Previous query
Retrieved sources
User constraints
Current answer context
Follow-up intent

👉 Interview Answer

Multi-turn AI search requires conversation state.

The system should track previous queries, constraints, retrieved sources, and user intent so follow-up searches can refine the answer.

1️⃣7️⃣ Safety and Abuse Prevention

Risks

AI search systems may surface:

Misinformation
Harmful instructions
Unsafe advice
Copyright-sensitive content
Private data
Malicious web pages
Prompt injection from web content

Controls

Source filtering
Input moderation
Output moderation
Prompt injection defense
Safe browsing filters
Citation requirements
High-risk query handling

👉 Interview Answer

AI search engines need safety controls for harmful content, misinformation, malicious pages, prompt injection, unsafe advice, and privacy risks.

Retrieved web content should be treated as untrusted input.

1️⃣8️⃣ Cost Control

Cost Drivers

Web search calls
Page fetching
Content extraction
Embeddings
Re-ranking
LLM generation
Long context
Frequent refresh

Controls

Query caching
Retrieval caching
Source deduplication
Smaller model routing
Token budget limits
Re-rank only top candidates
Cache fresh results with TTL

👉 Interview Answer

Cost control is important because AI search may call search APIs, fetch pages, run ranking, build long contexts, and call large models.

Caching, model routing, token budgets, and candidate limits help control cost.

1️⃣9️⃣ Observability

What to Monitor

Query latency
Retrieval latency
Fetch failures
Source quality
Citation accuracy
Answer quality
Freshness
User feedback
Cost per query
Cache hit rate
Safety blocks

Debugging Questions

Which sources were retrieved?
Why were they ranked?
Which snippets were sent to the model?
Which claim used which citation?
Was the answer grounded?

👉 Interview Answer

Observability should trace retrieval, ranking, content fetching, context building, generation, citations, latency, cost, and user feedback.

Without this, AI search quality is hard to debug.

2️⃣0️⃣ Best Practices

Practical Rules

Use query understanding before retrieval
Generate multiple search queries when useful
Prefer authoritative sources
Preserve citation metadata
Keep context focused
Detect freshness requirements
Treat web content as untrusted
Handle conflicting sources explicitly
Cache carefully with TTL
Evaluate citation correctness

Design Principle

AI search quality depends on retrieval,
ranking,
grounding,
and citations,
not just the LLM.

👉 Interview Answer

A production AI search engine should combine strong retrieval, source ranking, context building, grounded generation, citation validation, freshness detection, safety, caching, and observability.

🧠 Staff-Level Answer Final

👉 Interview Answer Full Version

To design an AI search engine like Perplexity, I would treat it as a search and retrieval product with an LLM generation layer, not just a chatbot.

The system starts with the user query.

A query understanding service classifies the intent, freshness requirement, ambiguity, search vertical, and expected answer type.

It may rewrite the query into multiple search-friendly variants to improve retrieval recall and source diversity.

The retrieval layer searches across web indexes, news indexes, academic sources, internal knowledge bases, user files, or external APIs.

Candidate sources are ranked by relevance, freshness, authority, trustworthiness, diversity, and user intent match.

For important queries, the system may fetch full pages, extract main content, remove boilerplate, chunk the text, and keep only the most relevant snippets.

The context builder then creates a compact evidence package for the LLM.

It preserves source titles, URLs, snippets, timestamps, and citation metadata.

The LLM generates an answer grounded in that evidence.

It should cite sources, avoid unsupported claims, handle conflicting evidence, and say when the evidence is insufficient.

Citations are a core product feature.

The citation system should map claims to actual supporting source snippets, not just attach random URLs.

Freshness is also critical.

The system should detect when a query requires recent information and use live search, news indexes, or APIs instead of relying only on stale indexes.

Multi-turn search requires state: previous queries, user constraints, retrieved sources, and follow-up intent.

Safety is important because retrieved web content is untrusted and may contain misinformation, harmful content, or prompt injection.

Cost control requires caching, source deduplication, model routing, token budgeting, and limiting expensive re-ranking.

Observability should trace query rewriting, retrieved sources, ranking decisions, context sent to the model, citations, latency, cost, and user feedback.

The key principle is that AI search quality depends on retrieval, ranking, grounding, and citations, not just the LLM.

⭐ Final Insight

Perplexity-style AI Search 的核心不是：

“Search + LLM 总结”

而是：

Query Understanding

Query Rewriting

Retrieval

Source Ranking

Content Extraction

Context Building

Grounded Generation

Citation Mapping

Freshness Detection

Safety

Observability。

最重要的一句话：

AI search quality depends on retrieval, ranking, grounding, and citations, not just the LLM.

中文部分

🎯 Design an AI Search Engine like Perplexity

1️⃣ 核心框架

设计 AI Search Engine 时，我通常从这些方面分析：

Product requirements
Query understanding
Search and retrieval
Source ranking
Answer generation
Citation and grounding
Freshness and crawling
核心权衡：accuracy vs latency vs cost

2️⃣ Product Goal

AI search engine 使用 retrieved web 或 knowledge sources 回答用户问题。

和传统 search 不同，它不只是返回 links。

它会返回：

Direct answer
Source citations
Relevant links
Follow-up questions
Search context
Optional summaries

Basic Flow

User Query
→ Understand Intent
→ Retrieve Sources
→ Rank Sources
→ Build Context
→ Generate Answer
→ Cite Sources

👉 面试回答

AI search engine 结合 search retrieval 和 LLM generation。

它检索相关 sources，对 sources 排序，构建 grounded context，生成直接答案，并提供 citations 让用户可以验证结果。

3️⃣ Functional Requirements

Core Features

系统应该支持：

User search query
Web retrieval
Source ranking
AI-generated answer
Citations
Follow-up questions
Search result links
Query history
Related queries

Advanced Features

Real-time news search
Academic search
Image search
Video search
File search
Personal knowledge search
Multi-turn search conversation
Search filters by date or source

👉 面试回答

核心需求包括 query understanding、 source retrieval、ranking、 grounded answer generation、citations 和 follow-up questions。

Advanced features 包括 real-time search、 vertical search、file search、 personal search 和 multi-turn search refinement。

4️⃣ Non-functional Requirements

Important System Qualities

系统应该优化：

Answer accuracy
Source quality
Low latency
Freshness
High availability
Scalability
Safety
Cost efficiency
Citation reliability

Key Trade-off

More sources and stronger models
→ Better answer quality

But also
→ Higher latency and cost

👉 面试回答

Non-functional requirements 包括 accuracy、 freshness、source quality、low latency、 scalability、reliability、safety 和 cost efficiency。

核心权衡是 answer quality 和 latency / cost。

5️⃣ High-Level Architecture

Architecture

Client
→ API Gateway
→ Search Service
→ Query Understanding
→ Web Search / Index Retrieval
→ Source Ranker
→ Content Fetcher
→ Context Builder
→ LLM Generator
→ Citation Builder
→ Response Streamer

Core Components

Query Understanding

分类 intent 并 rewrite query。

Retrieval Layer

搜索 web index、news index、 documents 或 external APIs。

Source Ranker

根据 relevance、freshness、 authority 和 diversity 排序 sources。

Context Builder

为 LLM 选择有用 snippets。

Citation Builder

把 answer claims 连接到 sources。

👉 面试回答

Perplexity-style system 包括 query understanding、 retrieval、source ranking、content fetching、 context building、LLM generation、 citation generation、streaming、 safety 和 observability。

6️⃣ Query Understanding

为什么 Query Understanding 重要？

User queries 经常很模糊。

系统需要理解：

Intent
Topic
Required freshness
Search vertical
Expected answer type
Ambiguity
Whether browsing is needed

Example

Query:
"latest Apple earnings"

Intent:
financial/news search

Freshness:
recent

Source preference:
financial reports and reputable news

👉 面试回答

Query understanding 决定用户在问什么、答案需要多新、应该使用哪些 retrieval sources，以及 query 是否需要 rewrite 或 clarify。

7️⃣ Query Rewriting

为什么 Rewrite Queries？

User queries 可能不适合 search。

Examples

User query:
"openai latest model"

Rewritten queries:
"OpenAI latest model announcement 2026"
"OpenAI API latest model release"

Benefits

Better retrieval recall
Better source diversity
Better exact matching
Better freshness targeting

👉 面试回答

Query rewriting 通过把 user question 转换成 search-friendly queries 来改善 retrieval。

系统可以生成多个 query variants，提升 recall、freshness 和 source diversity。

8️⃣ Retrieval Layer

Retrieval Sources

系统可以从这些地方 retrieve：

Web index
News index
Academic index
Internal knowledge base
User files
APIs
Structured databases

Retrieval Flow

Query
→ Search Index
→ Candidate Sources
→ Fetch Snippets
→ Rank Results

Hybrid Retrieval

强系统通常结合：

Keyword search
Vector search
Metadata filters
Freshness filters
Domain authority signals

👉 面试回答

Retrieval layer 负责找到 candidate sources。

Production AI search engine 通常结合 keyword search、vector search、 metadata filters、freshness signals 和 source authority signals。

9️⃣ Source Ranking

为什么 Ranking 重要？

不是所有 sources 都一样。

系统应该根据以下排序：

Relevance
Freshness
Authority
Trustworthiness
Diversity
Originality
Accessibility
User intent match

Example

For company financial results:
Official investor relations page
SEC filing
Reputable financial news
Blog summaries

👉 面试回答

Source ranking 决定哪些 documents 应该用于 answer generation。

系统应该优先使用 relevant、fresh、 authoritative、trustworthy 和 diverse sources。

🔟 Content Fetching and Extraction

为什么要 Fetch Full Content？

Search snippets 可能没有足够 context。

Content Extraction Steps

URL
→ Fetch page
→ Parse HTML
→ Remove boilerplate
→ Extract main content
→ Split into chunks

Challenges

Paywalls
Dynamic pages
Duplicate content
Ads and navigation
Broken pages
PDF parsing
Rate limits

👉 面试回答

检索到 candidate URLs 后，系统可能需要 fetch full pages、 extract main content、remove boilerplate、 chunk text，然后只把 relevant parts 提供给 LLM。

1️⃣1️⃣ Context Builder

Context Builder 做什么？

Context builder 选择最好的 evidence 给 model。

它需要决定：

Which sources to include
Which snippets to include
How many tokens to allocate
How to preserve source metadata
How to avoid duplicate evidence

Context Format

Source 1:
Title
URL
Relevant excerpt

Source 2:
Title
URL
Relevant excerpt

👉 面试回答

Context builder 把 ranked sources 转换成 compact evidence package 给 LLM。

它选择 relevant snippets，保留 source metadata，去重，并控制 prompt 在 token limits 内。

1️⃣2️⃣ Answer Generation

Generation Step

LLM 接收：

User question
Retrieved context
Citation metadata
Answer instructions
Safety rules
Output format

Good Answer Should

Directly answer the question
Use retrieved sources
Cite claims
Mention uncertainty
Avoid unsupported facts
Explain briefly and clearly

Important Rule

If sources do not support the answer,
say the evidence is insufficient.

👉 面试回答

LLM 应该生成基于 retrieved sources 的 grounded answer。

它应该引用 supporting evidence，避免 unsupported claims，并在 sources 不足时明确说明。

1️⃣3️⃣ Citation System

为什么 Citations 重要？

Citations 让用户可以验证答案。

也提升 trust。

Citation Builder Responsibilities

Link answer claims to source snippets
Preserve URL and title
Avoid fake citations
Cite the strongest source
Support multiple sources per claim
Handle conflicting evidence

Bad Citation

Answer cites a source that does not support the claim.

Good Citation

Claim
→ Supported by retrieved source excerpt
→ Citation shown to user

👉 面试回答

Citations 对 AI search engines 很关键。

系统应该把 answer claims 映射到 retrieved source evidence，并避免引用并不支持 claim 的 sources。

1️⃣4️⃣ Freshness

为什么 Freshness 重要？

Search queries 经常需要 current information。

Examples:

News
Sports
Stock prices
Product launches
Laws
Weather
Recent papers
Company announcements

Freshness Strategy

Query freshness classifier
→ If recent needed:
   use news index / live web / APIs
→ Else:
   use general index

👉 面试回答

Freshness 是核心设计需求。

系统应该检测 query 是否需要 recent information，当 freshness 重要时， route 到 live search、news indexes 或 APIs。

1️⃣5️⃣ Handling Conflicting Sources

Problem

不同 sources 可能 disagree。

Example

Source A says product launches in June.
Source B says product launches in July.

Strategy

Prefer primary sources
Compare dates
Show uncertainty
Cite both sides
Avoid overconfident answer

👉 面试回答

AI search systems 必须处理 conflicting evidence。

Answer 应该优先使用 primary 和 recent sources，在必要时说明 disagreement，并避免把 uncertain claims 说成事实。

1️⃣6️⃣ Multi-turn Search

为什么 Multi-turn 重要？

Users 会 refine search。

Example

User: Best laptops for AI development
Assistant: Gives options
User: Only under $1500
Assistant: Refines search

State Needed

Previous query
Retrieved sources
User constraints
Current answer context
Follow-up intent

👉 面试回答

Multi-turn AI search 需要 conversation state。

系统应该追踪 previous queries、constraints、 retrieved sources 和 user intent，这样 follow-up searches 才能 refine answer。

1️⃣7️⃣ Safety and Abuse Prevention

Risks

AI search systems 可能暴露：

Misinformation
Harmful instructions
Unsafe advice
Copyright-sensitive content
Private data
Malicious web pages
Prompt injection from web content

Controls

Source filtering
Input moderation
Output moderation
Prompt injection defense
Safe browsing filters
Citation requirements
High-risk query handling

👉 面试回答

AI search engines 需要 safety controls，处理 harmful content、misinformation、 malicious pages、prompt injection、 unsafe advice 和 privacy risks。

Retrieved web content 应被视为 untrusted input。

1️⃣8️⃣ Cost Control

Cost Drivers

Web search calls
Page fetching
Content extraction
Embeddings
Re-ranking
LLM generation
Long context
Frequent refresh

Controls

Query caching
Retrieval caching
Source deduplication
Smaller model routing
Token budget limits
Re-rank only top candidates
Cache fresh results with TTL

👉 面试回答

Cost control 很重要，因为 AI search 可能调用 search APIs、 fetch pages、运行 ranking、构建 long contexts 并调用 large models。

Caching、model routing、token budgets 和 candidate limits 可以控制 cost。

1️⃣9️⃣ Observability

What to Monitor

Query latency
Retrieval latency
Fetch failures
Source quality
Citation accuracy
Answer quality
Freshness
User feedback
Cost per query
Cache hit rate
Safety blocks

Debugging Questions

哪些 sources 被 retrieved？
为什么它们 rank 高？
哪些 snippets 被发送给 model？
哪个 claim 使用了哪个 citation？
Answer 是否 grounded？

👉 面试回答

Observability 应该 trace retrieval、ranking、 content fetching、context building、 generation、citations、latency、cost 和 user feedback。

没有这些， AI search quality 很难 debug。

2️⃣0️⃣ Best Practices

Practical Rules

Use query understanding before retrieval
Generate multiple search queries when useful
Prefer authoritative sources
Preserve citation metadata
Keep context focused
Detect freshness requirements
Treat web content as untrusted
Handle conflicting sources explicitly
Cache carefully with TTL
Evaluate citation correctness

Design Principle

AI search quality depends on retrieval,
ranking,
grounding,
and citations,
not just the LLM.

👉 面试回答

Production AI search engine 应结合 strong retrieval、source ranking、 context building、grounded generation、 citation validation、freshness detection、 safety、caching 和 observability。

🧠 Staff-Level Answer Final

👉 面试回答完整版本

设计 Perplexity-style AI search engine，我会把它看作 search and retrieval product 加上 LLM generation layer，而不是普通 chatbot。

系统从 user query 开始。

Query understanding service 会分类 intent、freshness requirement、 ambiguity、search vertical 和 expected answer type。

它可能把 query rewrite 成多个 search-friendly variants，提升 retrieval recall 和 source diversity。

Retrieval layer 会从 web indexes、 news indexes、academic sources、 internal knowledge bases、user files 或 external APIs 中搜索。

Candidate sources 根据 relevance、freshness、 authority、trustworthiness、diversity 和 user intent match 进行 ranking。

对重要 queries，系统可能 fetch full pages、 extract main content、remove boilerplate、 chunk text，并只保留最 relevant snippets。

Context builder 会为 LLM 创建 compact evidence package。

它保留 source titles、URLs、snippets、 timestamps 和 citation metadata。

LLM 基于这些 evidence 生成答案。

它应该 cite sources、避免 unsupported claims、处理 conflicting evidence，并在 evidence insufficient 时说明。

Citations 是核心 product feature。

Citation system 应该把 claims 映射到 actual supporting source snippets，而不是随机附 URLs。

Freshness 也很关键。

系统应该检测 query 是否需要 recent information，并使用 live search、news indexes 或 APIs，而不是只依赖 stale indexes。

Multi-turn search 需要 state： previous queries、user constraints、 retrieved sources 和 follow-up intent。

Safety 很重要，因为 retrieved web content 是 untrusted，可能包含 misinformation、harmful content 或 prompt injection。

Cost control 需要 caching、source deduplication、 model routing、token budgeting 和限制 expensive re-ranking。

Observability 应该追踪 query rewriting、 retrieved sources、ranking decisions、 context sent to model、citations、 latency、cost 和 user feedback。

核心原则是： AI search quality depends on retrieval、 ranking、grounding 和 citations， not just the LLM。

⭐ Final Insight

Perplexity-style AI Search 的核心不是：

“Search + LLM 总结”

而是：

Query Understanding

Query Rewriting

Retrieval

Source Ranking

Content Extraction

Context Building

Grounded Generation

Citation Mapping

Freshness Detection

Safety

Observability。

最重要的一句话：

AI search quality depends on retrieval, ranking, grounding, and citations, not just the LLM.