·

System Design Deep Dive - 02 Design an AI Search Engine like Perplexity

Post by ailswan May. 24, 2026

中文 ↓

🎯 Design an AI Search Engine like Perplexity


1️⃣ Core Framework

When designing an AI Search Engine, I frame it as:

  1. Product requirements
  2. Query understanding
  3. Search and retrieval
  4. Source ranking
  5. Answer generation
  6. Citation and grounding
  7. Freshness and crawling
  8. Trade-offs: accuracy vs latency vs cost

2️⃣ Product Goal

An AI search engine answers user questions using retrieved web or knowledge sources.

Unlike traditional search, it does not only return links.

It returns:


Basic Flow

User Query
→ Understand Intent
→ Retrieve Sources
→ Rank Sources
→ Build Context
→ Generate Answer
→ Cite Sources

👉 Interview Answer

An AI search engine combines search retrieval with LLM generation.

It retrieves relevant sources, ranks them, builds grounded context, generates a direct answer, and provides citations so users can verify the result.


3️⃣ Functional Requirements


Core Features

The system should support:


Advanced Features


👉 Interview Answer

The core requirements are query understanding, source retrieval, ranking, grounded answer generation, citations, and follow-up questions.

Advanced features include real-time search, vertical search, file search, personal search, and multi-turn search refinement.


4️⃣ Non-functional Requirements


Important System Qualities

The system should optimize for:


Key Trade-off

More sources and stronger models
→ Better answer quality

But also
→ Higher latency and cost

👉 Interview Answer

Non-functional requirements include accuracy, freshness, source quality, low latency, scalability, reliability, safety, and cost efficiency.

The core trade-off is answer quality versus latency and cost.


5️⃣ High-Level Architecture


Architecture

Client
→ API Gateway
→ Search Service
→ Query Understanding
→ Web Search / Index Retrieval
→ Source Ranker
→ Content Fetcher
→ Context Builder
→ LLM Generator
→ Citation Builder
→ Response Streamer

Core Components

Query Understanding

Classifies intent and rewrites the query.


Retrieval Layer

Searches web index, news index, documents, or external APIs.


Source Ranker

Ranks sources by relevance, freshness, authority, and diversity.


Context Builder

Selects useful snippets for the LLM.


Citation Builder

Links answer claims to sources.


👉 Interview Answer

A Perplexity-style system includes query understanding, retrieval, source ranking, content fetching, context building, LLM generation, citation generation, streaming, safety, and observability.


6️⃣ Query Understanding


Why Query Understanding Matters

User queries are often vague.

The system needs to understand:


Example

Query:
"latest Apple earnings"

Intent:
financial/news search

Freshness:
recent

Source preference:
financial reports and reputable news

👉 Interview Answer

Query understanding decides what the user is asking, how fresh the answer needs to be, which retrieval sources to use, and whether the query should be rewritten or clarified.


7️⃣ Query Rewriting


Why Rewrite Queries?

User queries may not be optimal for search.


Examples

User query:
"openai latest model"

Rewritten queries:
"OpenAI latest model announcement 2026"
"OpenAI API latest model release"

Benefits


👉 Interview Answer

Query rewriting improves retrieval by converting a user question into search-friendly queries.

The system may generate multiple query variants to improve recall, freshness, and source diversity.


8️⃣ Retrieval Layer


Retrieval Sources

The system may retrieve from:


Retrieval Flow

Query
→ Search Index
→ Candidate Sources
→ Fetch Snippets
→ Rank Results

Hybrid Retrieval

A strong system often combines:


👉 Interview Answer

The retrieval layer finds candidate sources.

A production AI search engine usually combines keyword search, vector search, metadata filters, freshness signals, and source authority signals.


9️⃣ Source Ranking


Why Ranking Matters

Not all sources are equal.

The system should rank by:


Example

For company financial results:
1. Official investor relations page
2. SEC filing
3. Reputable financial news
4. Blog summaries

👉 Interview Answer

Source ranking determines which documents should be used for answer generation.

The system should prefer relevant, fresh, authoritative, trustworthy, and diverse sources.


🔟 Content Fetching and Extraction


Why Fetch Full Content?

Search snippets may not contain enough context.


Content Extraction Steps

URL
→ Fetch page
→ Parse HTML
→ Remove boilerplate
→ Extract main content
→ Split into chunks

Challenges


👉 Interview Answer

After retrieving candidate URLs, the system may fetch full pages, extract main content, remove boilerplate, chunk the text, and pass only relevant parts to the LLM.


1️⃣1️⃣ Context Builder


What Context Builder Does

The context builder selects the best evidence for the model.

It must decide:


Context Format

Source 1:
Title
URL
Relevant excerpt

Source 2:
Title
URL
Relevant excerpt

👉 Interview Answer

The context builder converts ranked sources into a compact evidence package for the LLM.

It selects relevant snippets, preserves source metadata, removes duplicates, and keeps the prompt within token limits.


1️⃣2️⃣ Answer Generation


Generation Step

The LLM receives:


Good Answer Should


Important Rule

If sources do not support the answer,
say the evidence is insufficient.

👉 Interview Answer

The LLM should generate answers grounded in retrieved sources.

It should cite supporting evidence, avoid unsupported claims, and clearly state when the available sources are insufficient.


1️⃣3️⃣ Citation System


Why Citations Matter

Citations let users verify answers.

They also improve trust.


Citation Builder Responsibilities


Bad Citation

Answer cites a source that does not support the claim.

Good Citation

Claim
→ Supported by retrieved source excerpt
→ Citation shown to user

👉 Interview Answer

Citations are critical for AI search engines.

The system should map answer claims to retrieved source evidence and avoid citing sources that do not actually support the claim.


1️⃣4️⃣ Freshness


Why Freshness Matters

Search queries often require current information.

Examples:


Freshness Strategy

Query freshness classifier
→ If recent needed:
   use news index / live web / APIs
→ Else:
   use general index

👉 Interview Answer

Freshness is a key design requirement.

The system should detect whether a query needs recent information and route it to live search, news indexes, or APIs when freshness matters.


1️⃣5️⃣ Handling Conflicting Sources


Problem

Different sources may disagree.


Example

Source A says product launches in June.
Source B says product launches in July.

Strategy


👉 Interview Answer

AI search systems must handle conflicting evidence.

The answer should prefer primary and recent sources, mention disagreement when relevant, and avoid presenting uncertain claims as facts.


1️⃣6️⃣ Multi-turn Search


Why Multi-turn Matters

Users may refine their search.


Example

User: Best laptops for AI development
Assistant: Gives options
User: Only under $1500
Assistant: Refines search

State Needed


👉 Interview Answer

Multi-turn AI search requires conversation state.

The system should track previous queries, constraints, retrieved sources, and user intent so follow-up searches can refine the answer.


1️⃣7️⃣ Safety and Abuse Prevention


Risks

AI search systems may surface:


Controls


👉 Interview Answer

AI search engines need safety controls for harmful content, misinformation, malicious pages, prompt injection, unsafe advice, and privacy risks.

Retrieved web content should be treated as untrusted input.


1️⃣8️⃣ Cost Control


Cost Drivers


Controls


👉 Interview Answer

Cost control is important because AI search may call search APIs, fetch pages, run ranking, build long contexts, and call large models.

Caching, model routing, token budgets, and candidate limits help control cost.


1️⃣9️⃣ Observability


What to Monitor


Debugging Questions


👉 Interview Answer

Observability should trace retrieval, ranking, content fetching, context building, generation, citations, latency, cost, and user feedback.

Without this, AI search quality is hard to debug.


2️⃣0️⃣ Best Practices


Practical Rules


Design Principle

AI search quality depends on retrieval,
ranking,
grounding,
and citations,
not just the LLM.

👉 Interview Answer

A production AI search engine should combine strong retrieval, source ranking, context building, grounded generation, citation validation, freshness detection, safety, caching, and observability.


🧠 Staff-Level Answer Final


👉 Interview Answer Full Version

To design an AI search engine like Perplexity, I would treat it as a search and retrieval product with an LLM generation layer, not just a chatbot.

The system starts with the user query.

A query understanding service classifies the intent, freshness requirement, ambiguity, search vertical, and expected answer type.

It may rewrite the query into multiple search-friendly variants to improve retrieval recall and source diversity.

The retrieval layer searches across web indexes, news indexes, academic sources, internal knowledge bases, user files, or external APIs.

Candidate sources are ranked by relevance, freshness, authority, trustworthiness, diversity, and user intent match.

For important queries, the system may fetch full pages, extract main content, remove boilerplate, chunk the text, and keep only the most relevant snippets.

The context builder then creates a compact evidence package for the LLM.

It preserves source titles, URLs, snippets, timestamps, and citation metadata.

The LLM generates an answer grounded in that evidence.

It should cite sources, avoid unsupported claims, handle conflicting evidence, and say when the evidence is insufficient.

Citations are a core product feature.

The citation system should map claims to actual supporting source snippets, not just attach random URLs.

Freshness is also critical.

The system should detect when a query requires recent information and use live search, news indexes, or APIs instead of relying only on stale indexes.

Multi-turn search requires state: previous queries, user constraints, retrieved sources, and follow-up intent.

Safety is important because retrieved web content is untrusted and may contain misinformation, harmful content, or prompt injection.

Cost control requires caching, source deduplication, model routing, token budgeting, and limiting expensive re-ranking.

Observability should trace query rewriting, retrieved sources, ranking decisions, context sent to the model, citations, latency, cost, and user feedback.

The key principle is that AI search quality depends on retrieval, ranking, grounding, and citations, not just the LLM.


⭐ Final Insight

Perplexity-style AI Search 的核心不是:

“Search + LLM 总结”

而是:

Query Understanding

  • Query Rewriting
  • Retrieval
  • Source Ranking
  • Content Extraction
  • Context Building
  • Grounded Generation
  • Citation Mapping
  • Freshness Detection
  • Safety
  • Observability。

最重要的一句话:

AI search quality depends on retrieval, ranking, grounding, and citations, not just the LLM.


中文部分


🎯 Design an AI Search Engine like Perplexity


1️⃣ 核心框架

设计 AI Search Engine 时,我通常从这些方面分析:

  1. Product requirements
  2. Query understanding
  3. Search and retrieval
  4. Source ranking
  5. Answer generation
  6. Citation and grounding
  7. Freshness and crawling
  8. 核心权衡:accuracy vs latency vs cost

2️⃣ Product Goal

AI search engine 使用 retrieved web 或 knowledge sources 回答用户问题。

和传统 search 不同, 它不只是返回 links。

它会返回:


Basic Flow

User Query
→ Understand Intent
→ Retrieve Sources
→ Rank Sources
→ Build Context
→ Generate Answer
→ Cite Sources

👉 面试回答

AI search engine 结合 search retrieval 和 LLM generation。

它检索相关 sources, 对 sources 排序, 构建 grounded context, 生成直接答案, 并提供 citations 让用户可以验证结果。


3️⃣ Functional Requirements


Core Features

系统应该支持:


Advanced Features


👉 面试回答

核心需求包括 query understanding、 source retrieval、ranking、 grounded answer generation、citations 和 follow-up questions。

Advanced features 包括 real-time search、 vertical search、file search、 personal search 和 multi-turn search refinement。


4️⃣ Non-functional Requirements


Important System Qualities

系统应该优化:


Key Trade-off

More sources and stronger models
→ Better answer quality

But also
→ Higher latency and cost

👉 面试回答

Non-functional requirements 包括 accuracy、 freshness、source quality、low latency、 scalability、reliability、safety 和 cost efficiency。

核心权衡是 answer quality 和 latency / cost。


5️⃣ High-Level Architecture


Architecture

Client
→ API Gateway
→ Search Service
→ Query Understanding
→ Web Search / Index Retrieval
→ Source Ranker
→ Content Fetcher
→ Context Builder
→ LLM Generator
→ Citation Builder
→ Response Streamer

Core Components

Query Understanding

分类 intent 并 rewrite query。


Retrieval Layer

搜索 web index、news index、 documents 或 external APIs。


Source Ranker

根据 relevance、freshness、 authority 和 diversity 排序 sources。


Context Builder

为 LLM 选择有用 snippets。


Citation Builder

把 answer claims 连接到 sources。


👉 面试回答

Perplexity-style system 包括 query understanding、 retrieval、source ranking、content fetching、 context building、LLM generation、 citation generation、streaming、 safety 和 observability。


6️⃣ Query Understanding


为什么 Query Understanding 重要?

User queries 经常很模糊。

系统需要理解:


Example

Query:
"latest Apple earnings"

Intent:
financial/news search

Freshness:
recent

Source preference:
financial reports and reputable news

👉 面试回答

Query understanding 决定用户在问什么、 答案需要多新、 应该使用哪些 retrieval sources, 以及 query 是否需要 rewrite 或 clarify。


7️⃣ Query Rewriting


为什么 Rewrite Queries?

User queries 可能不适合 search。


Examples

User query:
"openai latest model"

Rewritten queries:
"OpenAI latest model announcement 2026"
"OpenAI API latest model release"

Benefits


👉 面试回答

Query rewriting 通过把 user question 转换成 search-friendly queries 来改善 retrieval。

系统可以生成多个 query variants, 提升 recall、freshness 和 source diversity。


8️⃣ Retrieval Layer


Retrieval Sources

系统可以从这些地方 retrieve:


Retrieval Flow

Query
→ Search Index
→ Candidate Sources
→ Fetch Snippets
→ Rank Results

Hybrid Retrieval

强系统通常结合:


👉 面试回答

Retrieval layer 负责找到 candidate sources。

Production AI search engine 通常结合 keyword search、vector search、 metadata filters、freshness signals 和 source authority signals。


9️⃣ Source Ranking


为什么 Ranking 重要?

不是所有 sources 都一样。

系统应该根据以下排序:


Example

For company financial results:
1. Official investor relations page
2. SEC filing
3. Reputable financial news
4. Blog summaries

👉 面试回答

Source ranking 决定哪些 documents 应该用于 answer generation。

系统应该优先使用 relevant、fresh、 authoritative、trustworthy 和 diverse sources。


🔟 Content Fetching and Extraction


为什么要 Fetch Full Content?

Search snippets 可能没有足够 context。


Content Extraction Steps

URL
→ Fetch page
→ Parse HTML
→ Remove boilerplate
→ Extract main content
→ Split into chunks

Challenges


👉 面试回答

检索到 candidate URLs 后, 系统可能需要 fetch full pages、 extract main content、remove boilerplate、 chunk text, 然后只把 relevant parts 提供给 LLM。


1️⃣1️⃣ Context Builder


Context Builder 做什么?

Context builder 选择最好的 evidence 给 model。

它需要决定:


Context Format

Source 1:
Title
URL
Relevant excerpt

Source 2:
Title
URL
Relevant excerpt

👉 面试回答

Context builder 把 ranked sources 转换成 compact evidence package 给 LLM。

它选择 relevant snippets, 保留 source metadata, 去重, 并控制 prompt 在 token limits 内。


1️⃣2️⃣ Answer Generation


Generation Step

LLM 接收:


Good Answer Should


Important Rule

If sources do not support the answer,
say the evidence is insufficient.

👉 面试回答

LLM 应该生成基于 retrieved sources 的 grounded answer。

它应该引用 supporting evidence, 避免 unsupported claims, 并在 sources 不足时明确说明。


1️⃣3️⃣ Citation System


为什么 Citations 重要?

Citations 让用户可以验证答案。

也提升 trust。


Citation Builder Responsibilities


Bad Citation

Answer cites a source that does not support the claim.

Good Citation

Claim
→ Supported by retrieved source excerpt
→ Citation shown to user

👉 面试回答

Citations 对 AI search engines 很关键。

系统应该把 answer claims 映射到 retrieved source evidence, 并避免引用并不支持 claim 的 sources。


1️⃣4️⃣ Freshness


为什么 Freshness 重要?

Search queries 经常需要 current information。

Examples:


Freshness Strategy

Query freshness classifier
→ If recent needed:
   use news index / live web / APIs
→ Else:
   use general index

👉 面试回答

Freshness 是核心设计需求。

系统应该检测 query 是否需要 recent information, 当 freshness 重要时, route 到 live search、news indexes 或 APIs。


1️⃣5️⃣ Handling Conflicting Sources


Problem

不同 sources 可能 disagree。


Example

Source A says product launches in June.
Source B says product launches in July.

Strategy


👉 面试回答

AI search systems 必须处理 conflicting evidence。

Answer 应该优先使用 primary 和 recent sources, 在必要时说明 disagreement, 并避免把 uncertain claims 说成事实。


1️⃣6️⃣ Multi-turn Search


为什么 Multi-turn 重要?

Users 会 refine search。


Example

User: Best laptops for AI development
Assistant: Gives options
User: Only under $1500
Assistant: Refines search

State Needed


👉 面试回答

Multi-turn AI search 需要 conversation state。

系统应该追踪 previous queries、constraints、 retrieved sources 和 user intent, 这样 follow-up searches 才能 refine answer。


1️⃣7️⃣ Safety and Abuse Prevention


Risks

AI search systems 可能暴露:


Controls


👉 面试回答

AI search engines 需要 safety controls, 处理 harmful content、misinformation、 malicious pages、prompt injection、 unsafe advice 和 privacy risks。

Retrieved web content 应被视为 untrusted input。


1️⃣8️⃣ Cost Control


Cost Drivers


Controls


👉 面试回答

Cost control 很重要, 因为 AI search 可能调用 search APIs、 fetch pages、运行 ranking、 构建 long contexts 并调用 large models。

Caching、model routing、token budgets 和 candidate limits 可以控制 cost。


1️⃣9️⃣ Observability


What to Monitor


Debugging Questions


👉 面试回答

Observability 应该 trace retrieval、ranking、 content fetching、context building、 generation、citations、latency、cost 和 user feedback。

没有这些, AI search quality 很难 debug。


2️⃣0️⃣ Best Practices


Practical Rules


Design Principle

AI search quality depends on retrieval,
ranking,
grounding,
and citations,
not just the LLM.

👉 面试回答

Production AI search engine 应结合 strong retrieval、source ranking、 context building、grounded generation、 citation validation、freshness detection、 safety、caching 和 observability。


🧠 Staff-Level Answer Final


👉 面试回答完整版本

设计 Perplexity-style AI search engine, 我会把它看作 search and retrieval product 加上 LLM generation layer, 而不是普通 chatbot。

系统从 user query 开始。

Query understanding service 会分类 intent、freshness requirement、 ambiguity、search vertical 和 expected answer type。

它可能把 query rewrite 成多个 search-friendly variants, 提升 retrieval recall 和 source diversity。

Retrieval layer 会从 web indexes、 news indexes、academic sources、 internal knowledge bases、user files 或 external APIs 中搜索。

Candidate sources 根据 relevance、freshness、 authority、trustworthiness、diversity 和 user intent match 进行 ranking。

对重要 queries, 系统可能 fetch full pages、 extract main content、remove boilerplate、 chunk text, 并只保留最 relevant snippets。

Context builder 会为 LLM 创建 compact evidence package。

它保留 source titles、URLs、snippets、 timestamps 和 citation metadata。

LLM 基于这些 evidence 生成答案。

它应该 cite sources、 避免 unsupported claims、 处理 conflicting evidence, 并在 evidence insufficient 时说明。

Citations 是核心 product feature。

Citation system 应该把 claims 映射到 actual supporting source snippets, 而不是随机附 URLs。

Freshness 也很关键。

系统应该检测 query 是否需要 recent information, 并使用 live search、news indexes 或 APIs, 而不是只依赖 stale indexes。

Multi-turn search 需要 state: previous queries、user constraints、 retrieved sources 和 follow-up intent。

Safety 很重要, 因为 retrieved web content 是 untrusted, 可能包含 misinformation、harmful content 或 prompt injection。

Cost control 需要 caching、source deduplication、 model routing、token budgeting 和限制 expensive re-ranking。

Observability 应该追踪 query rewriting、 retrieved sources、ranking decisions、 context sent to model、citations、 latency、cost 和 user feedback。

核心原则是: AI search quality depends on retrieval、 ranking、grounding 和 citations, not just the LLM。


⭐ Final Insight

Perplexity-style AI Search 的核心不是:

“Search + LLM 总结”

而是:

Query Understanding

  • Query Rewriting
  • Retrieval
  • Source Ranking
  • Content Extraction
  • Context Building
  • Grounded Generation
  • Citation Mapping
  • Freshness Detection
  • Safety
  • Observability。

最重要的一句话:

AI search quality depends on retrieval, ranking, grounding, and citations, not just the LLM.


Implement