System Design Deep Dive - 07 Design Search System

Post by ailswan Apr. 24, 2026

中文 ↓

🎯 Design Search System


1️⃣ Core Framework

When discussing Search System design, I frame it as:

  1. Core flows: ingest document, index document, search query
  2. Query understanding and normalization
  3. Retrieval strategy: keyword, filters, vector search
  4. Ranking and personalization
  5. Indexing strategy and freshness
  6. Scaling, sharding, caching, and replication
  7. Trade-offs: latency vs relevance vs freshness
  8. Failure handling and consistency

2️⃣ Core Requirements


Functional Requirements


Non-functional Requirements


👉 Interview Answer

A search system needs to ingest content, build searchable indexes, retrieve candidate results, and rank them by relevance.

The main challenge is balancing latency, relevance, freshness, and scalability.

Search is not a single algorithm; it is a pipeline of indexing, retrieval, ranking, and serving.


3️⃣ Main APIs


Index Document

POST /api/index

Request:

{
  "documentId": "doc123",
  "title": "Distributed Systems Basics",
  "body": "This article explains replication and sharding.",
  "tags": ["system-design", "database"],
  "createdAt": "2026-05-02T10:00:00Z"
}

GET /api/search?q=sharding&limit=20&cursor=xxx

Response:

{
  "results": [
    {
      "documentId": "doc123",
      "title": "Distributed Systems Basics",
      "snippet": "This article explains replication and sharding.",
      "score": 0.92
    }
  ],
  "nextCursor": "abc"
}

Autocomplete

GET /api/autocomplete?q=shar

👉 Interview Answer

I would expose APIs for indexing documents, searching documents, and supporting autocomplete.

Search APIs should be optimized for low latency, while indexing APIs can usually be asynchronous to improve write throughput and freshness.


4️⃣ Data Model


Document Table

document (
  document_id VARCHAR PRIMARY KEY,
  title TEXT,
  body TEXT,
  author_id VARCHAR,
  created_at TIMESTAMP,
  updated_at TIMESTAMP,
  status VARCHAR,
  metadata JSON
)

Inverted Index

term → posting list

Example:

"sharding" → [
  { documentId: "doc123", frequency: 3, positions: [10, 42, 90] },
  { documentId: "doc456", frequency: 1, positions: [15] }
]

Forward Index

document_id → terms / metadata

Used for:


Ranking Feature Store

ranking_feature (
  document_id VARCHAR PRIMARY KEY,
  click_count BIGINT,
  view_count BIGINT,
  freshness_score DOUBLE,
  quality_score DOUBLE,
  updated_at TIMESTAMP
)

👉 Interview Answer

I would store the original documents separately from the search index.

The document store is the source of truth, while the inverted index is optimized for retrieval.

I may also maintain a forward index and ranking feature store to support highlighting, re-indexing, and ranking.


5️⃣ Query Understanding


Responsibilities


Example

"best running shoes for winter"

Can be understood as:

intent = product search
terms = running shoes, winter
filters = category: shoes
ranking boost = seasonal relevance

Advanced Techniques


👉 Interview Answer

Query understanding transforms raw user input into a structured search request.

It may include tokenization, normalization, spelling correction, synonym expansion, and intent detection.

This step is important because poor query understanding leads to poor retrieval even if the ranking model is strong.


6️⃣ Retrieval / Candidate Generation


Goal

Retrieve a manageable candidate set quickly.

query → top N candidate documents

Usually:

N = 100 to 10,000

Retrieval Methods

Keyword Retrieval

Uses:

Good for:


Vector Retrieval

Uses:

Good for:


Filter Retrieval

Examples:

category = electronics
price < 100
created_at > last_30_days

Hybrid Retrieval

keyword retrieval + vector retrieval + filters

👉 Interview Answer

Retrieval focuses on generating a candidate set efficiently.

I would use an inverted index for keyword search, and optionally vector search for semantic matching.

In modern systems, hybrid retrieval is common, combining keyword search, vector search, and structured filters.

The goal is high recall with low latency, because ranking will refine the final ordering.


7️⃣ Ranking


Goal

Order candidates by relevance.


Ranking Signals


Ranking Pipeline

Candidate Retrieval
→ Lightweight Scoring
→ ML Ranking
→ Re-ranking
→ Final Results

Multi-stage Ranking

Stage 1: First-pass Ranking


Stage 2: ML Ranking


Stage 3: Re-ranking

Goals:


👉 Interview Answer

Ranking determines the final ordering of search results.

I would use a multi-stage ranking pipeline. The first stage uses cheap relevance signals, while later stages use more expensive ML models.

Finally, re-ranking can enforce diversity, safety, freshness, and business constraints.


8️⃣ Indexing Pipeline


Basic Flow

Document created / updated
→ Event published
→ Indexing worker consumes event
→ Parse and normalize document
→ Build index entries
→ Update inverted index
→ Update ranking features
→ Make document searchable

Batch Indexing

Used for:

Pros:

Cons:


Real-time Indexing

Used for:

Pros:

Cons:


Near-real-time Indexing

Recommended for many systems.

small delay, usually seconds

👉 Interview Answer

I would use an asynchronous indexing pipeline.

When a document is created or updated, the source service publishes an event.

Indexing workers consume the event, parse the document, update the inverted index, and refresh ranking features.

Most production systems use near-real-time indexing to balance freshness and cost.


9️⃣ Autocomplete


Requirements


Data Structures


Example

Input: "shar"
Suggestions:
- sharding
- shared database
- shard key

👉 Interview Answer

Autocomplete should be optimized separately from full search.

I would use a prefix index or trie, combined with popular query statistics.

Since autocomplete is called on every keystroke, it must be extremely low latency and heavily cached.


🔟 Caching Strategy


What to Cache?


Cache Layers


Cache Challenges


👉 Interview Answer

Caching is important because many search queries are repeated.

I would cache popular query results, hot posting lists, document metadata, ranking features, and autocomplete suggestions.

However, caching search results is tricky because freshness, filters, and personalization can change the result set.


1️⃣1️⃣ Sharding and Replication


Why Shard?


Sharding Strategies

Document-based Sharding

document_id hash → shard

Pros:

Cons:


Term-based Sharding

term → shard

Pros:

Cons:


Hybrid Sharding

Used in larger systems.


Replication

Used for:


👉 Interview Answer

I would shard the index horizontally, commonly by document ID, because it gives a balanced distribution.

Search queries may fan out to multiple shards, and each shard returns top K results.

The coordinator then merges and ranks results globally.

Replication improves availability and read throughput.


1️⃣2️⃣ Query Serving Flow


Flow

User query
→ API Gateway
→ Search Service
→ Query Understanding
→ Query Planner
→ Fan out to index shards
→ Retrieve candidates
→ Merge results
→ Rank / re-rank
→ Fetch document metadata
→ Return results

Shard Merge

Each shard returns:

top K local results

Coordinator merges:

global top K results

👉 Interview Answer

For query serving, the search service first understands the query, then fans out the request to relevant index shards.

Each shard returns local top results.

A coordinator merges those results, applies ranking or re-ranking, fetches metadata, and returns the final response.


1️⃣3️⃣ Trade-offs


Latency vs Relevance


Freshness vs Cost


Recall vs Precision


Personalization vs Cacheability


Consistency vs Availability


👉 Interview Answer

The main trade-offs are latency, relevance, freshness, recall, precision, and cost.

Larger candidate sets and stronger ranking models improve relevance, but they increase latency.

Real-time indexing improves freshness, but increases system complexity.

In most search systems, eventual consistency is acceptable for index updates.


1️⃣4️⃣ Failure Handling


Common Failures


Strategies


👉 Interview Answer

Search systems should degrade gracefully.

If one shard is unavailable, the system may return partial results instead of failing the whole query.

If the ranking service is down, we can fall back to BM25 or simple relevance scoring.

Since the document store is the source of truth, the index can be rebuilt if necessary.


1️⃣5️⃣ Consistency Model


Stronger Consistency Needed For


Eventual Consistency Acceptable For


👉 Interview Answer

The search index usually does not need strong consistency.

It is acceptable if a newly created document appears in search after a short delay.

However, deleted, blocked, or permission-sensitive content needs stronger correctness, so I would apply read-time permission checks before returning results.


1️⃣6️⃣ Security and Access Control


Why It Matters

Search may expose sensitive content.

Examples:


Strategies


👉 Interview Answer

Search must enforce access control carefully.

Even if permissions are indexed, I would still perform read-time permission checks for sensitive content.

This prevents stale index entries from exposing private, deleted, or blocked content.


1️⃣7️⃣ Observability


Key Metrics


Logs

Track:

query_id
user_id
query_text
filters
shards_queried
latency
result_count
ranking_version

👉 Interview Answer

Observability is critical for search quality and reliability.

I would monitor query latency, indexing lag, empty result rate, shard failures, cache hit rate, and ranking latency.

I would also track relevance metrics such as click-through rate and successful search sessions.


1️⃣8️⃣ End-to-End Flow


Indexing Flow

Document created
→ Event published
→ Indexing worker parses document
→ Update inverted index
→ Update ranking features
→ Document becomes searchable

Search Flow

User submits query
→ Query understanding
→ Retrieve candidates from shards
→ Merge results
→ Rank / re-rank
→ Apply permission checks
→ Return results

Autocomplete Flow

User types prefix
→ Query autocomplete index
→ Rank suggestions by popularity
→ Return suggestions

Key Insight

Search is a pipeline system, not just a database query.


🧠 Staff-Level Answer (Final)


👉 Interview Answer (Full Version)

When designing a search system, I think of it as a pipeline of indexing, retrieval, ranking, and serving.

Documents are stored in a source-of-truth document store, while the search index is a read-optimized structure used for fast retrieval.

For query understanding, I would normalize the query, tokenize it, apply typo correction, synonym expansion, and potentially detect user intent.

For retrieval, I would use an inverted index for keyword search, and optionally vector search for semantic matching. In modern systems, hybrid retrieval is common, combining keyword search, vector search, and filters.

Ranking is usually multi-stage. First, we retrieve candidates using cheap signals like BM25. Then we apply more expensive ranking models, and finally re-rank results for diversity, freshness, safety, and business constraints.

For indexing, I would use an asynchronous near-real-time pipeline. Document updates publish events, indexing workers update the inverted index, and the system keeps indexing lag low.

To scale, I would shard indexes, replicate shards for availability, cache hot queries and posting lists, and use a coordinator to merge results from multiple shards.

The main trade-offs are latency, relevance, freshness, recall, precision, and cost.

Search index updates can usually be eventually consistent, but deleted, blocked, or permission-sensitive content needs stronger correctness through read-time checks.

Ultimately, the goal is to return relevant results quickly, while keeping the index fresh, scalable, and reliable.


⭐ Final Insight

Search System 的核心不是简单查数据库, 而是在大规模数据上通过 indexing、retrieval 和 ranking 快速返回最相关的结果。



中文部分


🎯 Design Search System


1️⃣ 核心框架

在设计 Search System 时,我通常从以下几个方面来分析:

  1. 核心流程:写入 document、建立索引、执行搜索查询
  2. Query understanding 和 query normalization
  3. Retrieval 策略:关键词、过滤、向量搜索
  4. Ranking 和个性化
  5. Indexing 策略和新鲜度
  6. 扩展、分片、缓存和副本
  7. 核心权衡:延迟 vs 相关性 vs 新鲜度
  8. 故障处理和一致性

2️⃣ 核心需求


功能需求


非功能需求


👉 面试回答

Search System 需要接收内容, 建立可搜索索引, 根据 query 检索候选结果, 并按照相关性进行排序。

核心挑战是在延迟、相关性、新鲜度和扩展性之间做平衡。

搜索不是单一算法, 而是由 indexing、retrieval、ranking 和 serving 组成的 pipeline。


3️⃣ 主要 API


Index Document

POST /api/index

Request:

{
  "documentId": "doc123",
  "title": "Distributed Systems Basics",
  "body": "This article explains replication and sharding.",
  "tags": ["system-design", "database"],
  "createdAt": "2026-05-02T10:00:00Z"
}

Search

GET /api/search?q=sharding&limit=20&cursor=xxx

Response:

{
  "results": [
    {
      "documentId": "doc123",
      "title": "Distributed Systems Basics",
      "snippet": "This article explains replication and sharding.",
      "score": 0.92
    }
  ],
  "nextCursor": "abc"
}

Autocomplete

GET /api/autocomplete?q=shar

👉 面试回答

我会提供 document indexing、search 和 autocomplete 相关 API。

Search API 需要针对低延迟优化, 而 indexing API 通常可以异步化, 用来提升写入吞吐和索引新鲜度。


4️⃣ 数据模型


Document Table

document (
  document_id VARCHAR PRIMARY KEY,
  title TEXT,
  body TEXT,
  author_id VARCHAR,
  created_at TIMESTAMP,
  updated_at TIMESTAMP,
  status VARCHAR,
  metadata JSON
)

Inverted Index

term → posting list

示例:

"sharding" → [
  { documentId: "doc123", frequency: 3, positions: [10, 42, 90] },
  { documentId: "doc456", frequency: 1, positions: [15] }
]

Forward Index

document_id → terms / metadata

用于:


Ranking Feature Store

ranking_feature (
  document_id VARCHAR PRIMARY KEY,
  click_count BIGINT,
  view_count BIGINT,
  freshness_score DOUBLE,
  quality_score DOUBLE,
  updated_at TIMESTAMP
)

👉 面试回答

我会将原始 documents 和 search index 分开存储。

Document store 是 source of truth, inverted index 则是为检索优化的结构。

我也可能维护 forward index 和 ranking feature store, 用于高亮、重建索引和排序特征计算。


5️⃣ Query Understanding


职责


示例

"best running shoes for winter"

可以理解为:

intent = product search
terms = running shoes, winter
filters = category: shoes
ranking boost = seasonal relevance

高级技术


👉 面试回答

Query understanding 会将用户输入的原始 query 转换成结构化搜索请求。

它可能包含分词、标准化、拼写纠错、 同义词扩展和意图识别。

这一步很重要, 因为如果 query 理解不好, 即使后面的 ranking 很强, 也可能召回不到正确结果。


6️⃣ Retrieval / Candidate Generation


目标

快速召回一个可管理的候选集合。

query → top N candidate documents

通常:

N = 100 to 10,000

Retrieval Methods

Keyword Retrieval

使用:

适合:


Vector Retrieval

使用:

适合:


Filter Retrieval

示例:

category = electronics
price < 100
created_at > last_30_days

Hybrid Retrieval

keyword retrieval + vector retrieval + filters

👉 面试回答

Retrieval 的目标是高效生成候选集合。

我会使用 inverted index 做关键词搜索, 并根据需求加入 vector search 做语义匹配。

在现代系统中,hybrid retrieval 很常见, 会结合关键词搜索、向量搜索和结构化过滤。

Retrieval 的目标是高召回和低延迟, 因为后续 ranking 会负责精排。


7️⃣ Ranking


目标

按照相关性对候选结果排序。


Ranking Signals


Ranking Pipeline

Candidate Retrieval
→ Lightweight Scoring
→ ML Ranking
→ Re-ranking
→ Final Results

Multi-stage Ranking

Stage 1: First-pass Ranking


Stage 2: ML Ranking


Stage 3: Re-ranking

目标:


👉 面试回答

Ranking 决定搜索结果的最终顺序。

我会使用多阶段 ranking pipeline。 第一阶段使用便宜的相关性信号, 后续阶段使用更复杂的 ML model。

最后 re-ranking 可以用于保证多样性、 新鲜度、安全性和业务约束。


8️⃣ Indexing Pipeline


基本流程

Document created / updated
→ Event published
→ Indexing worker consumes event
→ Parse and normalize document
→ Build index entries
→ Update inverted index
→ Update ranking features
→ Make document searchable

Batch Indexing

适用于:

优点:

缺点:


Real-time Indexing

适用于:

优点:

缺点:


Near-real-time Indexing

很多系统推荐使用。

small delay, usually seconds

👉 面试回答

我会使用异步 indexing pipeline。

当 document 创建或更新时, source service 会发布事件。

Indexing workers 消费事件, 解析 document, 更新 inverted index, 并刷新 ranking features。

大多数生产系统使用 near-real-time indexing, 在新鲜度和成本之间取得平衡。


9️⃣ Autocomplete


需求


数据结构


示例

Input: "shar"
Suggestions:
- sharding
- shared database
- shard key

👉 面试回答

Autocomplete 应该和完整搜索分开优化。

我会使用 prefix index 或 trie, 并结合热门 query 统计。

因为 autocomplete 会在用户每次输入时触发, 所以它必须极低延迟,并且高度缓存。


🔟 缓存策略


缓存什么?


缓存层


缓存挑战


👉 面试回答

缓存很重要, 因为很多搜索 query 会重复出现。

我会缓存热门 query results、 热门 posting lists、document metadata、 ranking features 和 autocomplete suggestions。

但是搜索结果缓存比较复杂, 因为新鲜度、过滤条件和个性化都会改变结果集合。


1️⃣1️⃣ Sharding and Replication


为什么需要 Sharding?


Sharding Strategies

Document-based Sharding

document_id hash → shard

优点:

缺点:


Term-based Sharding

term → shard

优点:

缺点:


Hybrid Sharding

大型系统常用。


Replication

用于:


👉 面试回答

我会水平分片 search index, 常见方式是按 document ID 分片, 因为这样分布更均匀。

Search query 可能需要 fan out 到多个 shards, 每个 shard 返回 top K 结果。

Coordinator 再合并并进行全局排序。

Replication 用于提升可用性和读吞吐。


1️⃣2️⃣ Query Serving Flow


Flow

User query
→ API Gateway
→ Search Service
→ Query Understanding
→ Query Planner
→ Fan out to index shards
→ Retrieve candidates
→ Merge results
→ Rank / re-rank
→ Fetch document metadata
→ Return results

Shard Merge

每个 shard 返回:

top K local results

Coordinator 合并:

global top K results

👉 面试回答

在 query serving 中, search service 先理解 query, 然后将请求 fan out 到相关 index shards。

每个 shard 返回本地 top results。

Coordinator 合并这些结果, 执行 ranking 或 re-ranking, 获取 metadata, 最后返回最终结果。


1️⃣3️⃣ 核心权衡


Latency vs Relevance


Freshness vs Cost


Recall vs Precision


Personalization vs Cacheability


Consistency vs Availability


👉 面试回答

核心权衡包括延迟、相关性、新鲜度、 召回、精度和成本。

更大的候选集合和更强的 ranking model 可以提升相关性, 但会增加延迟。

实时 indexing 可以提高新鲜度, 但会增加系统复杂度。

对大多数搜索系统来说, index updates 可以接受最终一致。


1️⃣4️⃣ 故障处理


常见故障


处理策略


👉 面试回答

Search system 应该支持优雅降级。

如果某个 shard 不可用, 系统可以返回 partial results, 而不是让整个 query 失败。

如果 ranking service 故障, 可以回退到 BM25 或简单相关性排序。

因为 document store 是 source of truth, 所以 index 在必要时可以重建。


1️⃣5️⃣ 一致性模型


需要较强一致性的场景


可以最终一致的场景


👉 面试回答

Search index 通常不需要强一致。

新创建的 document 晚几秒出现在 search results 中 通常是可以接受的。

但是 deleted、blocked 或 permission-sensitive content 需要更强正确性, 所以我会在返回结果前进行 read-time permission checks。


1️⃣6️⃣ 安全和访问控制


为什么重要?

Search 可能暴露敏感内容。

例如:


策略


👉 面试回答

Search 必须谨慎执行 access control。

即使 permissions 已经写入 index, 对敏感内容我仍然会做 read-time permission check。

这样可以防止 stale index entries 暴露 private、 deleted 或 blocked content。


1️⃣7️⃣ 可观测性


Key Metrics


Logs

追踪:

query_id
user_id
query_text
filters
shards_queried
latency
result_count
ranking_version

👉 面试回答

可观测性对搜索质量和可靠性非常重要。

我会监控 query latency、indexing lag、 empty result rate、shard failures、 cache hit rate 和 ranking latency。

同时也会追踪 click-through rate 和 successful search sessions 等相关性指标。


1️⃣8️⃣ End-to-End Flow


Indexing Flow

Document created
→ Event published
→ Indexing worker parses document
→ Update inverted index
→ Update ranking features
→ Document becomes searchable

Search Flow

User submits query
→ Query understanding
→ Retrieve candidates from shards
→ Merge results
→ Rank / re-rank
→ Apply permission checks
→ Return results

Autocomplete Flow

User types prefix
→ Query autocomplete index
→ Rank suggestions by popularity
→ Return suggestions

Key Insight

Search 是一个 pipeline system, 不是简单的数据库查询。


🧠 Staff-Level Answer(最终版)


👉 面试回答(完整背诵版)

在设计 Search System 时, 我会把它看作一个由 indexing、retrieval、ranking 和 serving 组成的 pipeline。

Documents 会存储在 source-of-truth document store 中, search index 则是为了快速检索而设计的读优化结构。

对于 query understanding, 我会对 query 做标准化、分词、拼写纠错、 同义词扩展,并根据场景识别用户意图。

对于 retrieval, 我会使用 inverted index 处理关键词搜索, 并根据需求加入 vector search 做语义匹配。 在现代系统中,hybrid retrieval 很常见, 会结合关键词搜索、向量搜索和 filters。

Ranking 通常是多阶段的。 第一阶段使用 BM25 等低成本信号召回 candidates, 然后使用更复杂的 ranking model, 最后通过 re-ranking 处理多样性、新鲜度、 安全性和业务约束。

对于 indexing, 我会使用异步 near-real-time pipeline。 Document updates 会发布事件, indexing workers 更新 inverted index, 并控制 indexing lag。

为了扩展, 我会对 index 分片, 使用副本提升可用性, 缓存热门 query 和 posting lists, 并使用 coordinator 合并多个 shards 的结果。

核心权衡包括延迟、相关性、新鲜度、 召回、精度和成本。

Search index updates 通常可以最终一致, 但 deleted、blocked 或 permission-sensitive content 需要通过 read-time checks 保证正确性。

最终目标是在保持 index 新鲜、可扩展和可靠的同时, 快速返回最相关的搜索结果。


⭐ Final Insight

Search System 的核心不是简单查数据库, 而是在大规模数据上通过 indexing、retrieval 和 ranking 快速返回最相关的结果。

Implement