d&d-t System Design Deep Dive ·

🎯 Design Search System

1️⃣ Core Framework

When discussing Search System design, I frame it as:

Core flows: ingest document, index document, search query
Query understanding and normalization
Retrieval strategy: keyword, filters, vector search
Ranking and personalization
Indexing strategy and freshness
Scaling, sharding, caching, and replication
Trade-offs: latency vs relevance vs freshness
Failure handling and consistency

2️⃣ Core Requirements

Functional Requirements

User can search documents / products / posts
Support keyword search
Support filters and sorting
Support autocomplete
Support typo tolerance
Support ranking by relevance
Support freshness for newly created content
Support pagination

Non-functional Requirements

Low search latency
High availability
High recall and precision
Scalable indexing pipeline
Near-real-time index updates
Support high query QPS
Graceful degradation during partial failures

👉 Interview Answer

A search system needs to ingest content, build searchable indexes, retrieve candidate results, and rank them by relevance.

The main challenge is balancing latency, relevance, freshness, and scalability.

Search is not a single algorithm; it is a pipeline of indexing, retrieval, ranking, and serving.

3️⃣ Main APIs

Index Document

POST /api/index

Request:

{
  "documentId": "doc123",
  "title": "Distributed Systems Basics",
  "body": "This article explains replication and sharding.",
  "tags": ["system-design", "database"],
  "createdAt": "2026-05-02T10:00:00Z"
}

Search

GET /api/search?q=sharding&limit=20&cursor=xxx

Response:

{
  "results": [
    {
      "documentId": "doc123",
      "title": "Distributed Systems Basics",
      "snippet": "This article explains replication and sharding.",
      "score": 0.92
    }
  ],
  "nextCursor": "abc"
}

Autocomplete

GET /api/autocomplete?q=shar

👉 Interview Answer

I would expose APIs for indexing documents, searching documents, and supporting autocomplete.

Search APIs should be optimized for low latency, while indexing APIs can usually be asynchronous to improve write throughput and freshness.

4️⃣ Data Model

Document Table

document (
  document_id VARCHAR PRIMARY KEY,
  title TEXT,
  body TEXT,
  author_id VARCHAR,
  created_at TIMESTAMP,
  updated_at TIMESTAMP,
  status VARCHAR,
  metadata JSON
)

Inverted Index

term → posting list

Example:

"sharding" → [
  { documentId: "doc123", frequency: 3, positions: [10, 42, 90] },
  { documentId: "doc456", frequency: 1, positions: [15] }
]

Forward Index

document_id → terms / metadata

Used for:

Re-indexing
Highlighting
Debugging
Feature extraction

Ranking Feature Store

ranking_feature (
  document_id VARCHAR PRIMARY KEY,
  click_count BIGINT,
  view_count BIGINT,
  freshness_score DOUBLE,
  quality_score DOUBLE,
  updated_at TIMESTAMP
)

👉 Interview Answer

I would store the original documents separately from the search index.

The document store is the source of truth, while the inverted index is optimized for retrieval.

I may also maintain a forward index and ranking feature store to support highlighting, re-indexing, and ranking.

5️⃣ Query Understanding

Responsibilities

Tokenization
Lowercasing
Stop word removal
Stemming / lemmatization
Synonym expansion
Typo correction
Intent understanding

Example

"best running shoes for winter"

Can be understood as:

intent = product search
terms = running shoes, winter
filters = category: shoes
ranking boost = seasonal relevance

Advanced Techniques

Query rewriting
Spell correction
Synonym dictionary
Entity extraction
Language detection
Personalization context

👉 Interview Answer

Query understanding transforms raw user input into a structured search request.

It may include tokenization, normalization, spelling correction, synonym expansion, and intent detection.

This step is important because poor query understanding leads to poor retrieval even if the ranking model is strong.

6️⃣ Retrieval / Candidate Generation

Goal

Retrieve a manageable candidate set quickly.

query → top N candidate documents

Usually:

N = 100 to 10,000

Retrieval Methods

Keyword Retrieval

Uses:

Inverted index
BM25
TF-IDF

Good for:

Exact keyword matching
Structured search
High precision

Vector Retrieval

Uses:

Embeddings
ANN index
HNSW / IVF / PQ

Good for:

Semantic search
Natural language queries
Synonym-like matching

Filter Retrieval

Examples:

category = electronics
price < 100
created_at > last_30_days

Hybrid Retrieval

keyword retrieval + vector retrieval + filters

👉 Interview Answer

Retrieval focuses on generating a candidate set efficiently.

I would use an inverted index for keyword search, and optionally vector search for semantic matching.

In modern systems, hybrid retrieval is common, combining keyword search, vector search, and structured filters.

The goal is high recall with low latency, because ranking will refine the final ordering.

7️⃣ Ranking

Goal

Order candidates by relevance.

Ranking Signals

Text relevance score
Query-document match
Click-through rate
Freshness
Popularity
User personalization
Geographic relevance
Business rules
Content quality
Safety signals

Ranking Pipeline

Candidate Retrieval
→ Lightweight Scoring
→ ML Ranking
→ Re-ranking
→ Final Results

Multi-stage Ranking

Stage 1: First-pass Ranking

BM25 score
Simple freshness boost
Cheap features

Stage 2: ML Ranking

Gradient boosted trees
Neural ranking model
Learning-to-rank model

Stage 3: Re-ranking

Goals:

Diversity
Freshness
Deduplication
Safety filtering
Business constraints

👉 Interview Answer

Ranking determines the final ordering of search results.

I would use a multi-stage ranking pipeline. The first stage uses cheap relevance signals, while later stages use more expensive ML models.

Finally, re-ranking can enforce diversity, safety, freshness, and business constraints.

8️⃣ Indexing Pipeline

Basic Flow

Document created / updated
→ Event published
→ Indexing worker consumes event
→ Parse and normalize document
→ Build index entries
→ Update inverted index
→ Update ranking features
→ Make document searchable

Batch Indexing

Used for:

Large backfills
Offline rebuilds
Reprocessing all documents

Pros:

Efficient
Easy to optimize

Cons:

Less fresh

Real-time Indexing

Used for:

Newly created documents
Fresh content
Time-sensitive search

Pros:

High freshness

Cons:

More complex
More expensive

Near-real-time Indexing

Recommended for many systems.

small delay, usually seconds

👉 Interview Answer

I would use an asynchronous indexing pipeline.

When a document is created or updated, the source service publishes an event.

Indexing workers consume the event, parse the document, update the inverted index, and refresh ranking features.

Most production systems use near-real-time indexing to balance freshness and cost.

9️⃣ Autocomplete

Requirements

Return suggestions as user types
Very low latency
Handle popular queries
Handle typo tolerance

Data Structures

Trie
Prefix index
N-gram index
Popular query cache

Example

Input: "shar"
Suggestions:
- sharding
- shared database
- shard key

👉 Interview Answer

Autocomplete should be optimized separately from full search.

I would use a prefix index or trie, combined with popular query statistics.

Since autocomplete is called on every keystroke, it must be extremely low latency and heavily cached.

🔟 Caching Strategy

What to Cache?

Popular query results
Posting lists for hot terms
Ranking features
Document metadata
Autocomplete suggestions
User personalization features

Cache Layers

Local cache
Redis / Memcached
CDN for public search pages
Search engine internal cache

Cache Challenges

Freshness
Personalization
Filter combinations
Stale ranking features

👉 Interview Answer

Caching is important because many search queries are repeated.

I would cache popular query results, hot posting lists, document metadata, ranking features, and autocomplete suggestions.

However, caching search results is tricky because freshness, filters, and personalization can change the result set.

1️⃣1️⃣ Sharding and Replication

Why Shard?

Index may be too large for one node
Query QPS may be too high
Need horizontal scale

Sharding Strategies

Document-based Sharding

document_id hash → shard

Pros:

Even distribution
Easy to scale

Cons:

Query must fan out to many shards

Term-based Sharding

term → shard

Pros:

Query can hit fewer shards for some terms

Cons:

Hot terms can create hot shards

Hybrid Sharding

Used in larger systems.

Replication

Used for:

High availability
Read scaling
Failover

👉 Interview Answer

I would shard the index horizontally, commonly by document ID, because it gives a balanced distribution.

Search queries may fan out to multiple shards, and each shard returns top K results.

The coordinator then merges and ranks results globally.

Replication improves availability and read throughput.

1️⃣2️⃣ Query Serving Flow

Flow

User query
→ API Gateway
→ Search Service
→ Query Understanding
→ Query Planner
→ Fan out to index shards
→ Retrieve candidates
→ Merge results
→ Rank / re-rank
→ Fetch document metadata
→ Return results

Shard Merge

Each shard returns:

top K local results

Coordinator merges:

global top K results

👉 Interview Answer

For query serving, the search service first understands the query, then fans out the request to relevant index shards.

Each shard returns local top results.

A coordinator merges those results, applies ranking or re-ranking, fetches metadata, and returns the final response.

1️⃣3️⃣ Trade-offs

Latency vs Relevance

More ranking features improve relevance
But increase latency

Freshness vs Cost

Real-time indexing improves freshness
But costs more and increases complexity

Recall vs Precision

Large candidate set improves recall
But makes ranking slower

Personalization vs Cacheability

Personalized results improve relevance
But reduce cache hit rate

Consistency vs Availability

Search results can be eventually consistent
Query serving should remain highly available

👉 Interview Answer

The main trade-offs are latency, relevance, freshness, recall, precision, and cost.

Larger candidate sets and stronger ranking models improve relevance, but they increase latency.

Real-time indexing improves freshness, but increases system complexity.

In most search systems, eventual consistency is acceptable for index updates.

1️⃣4️⃣ Failure Handling

Common Failures

Index shard unavailable
Indexing pipeline delayed
Ranking service down
Cache unavailable
Stale index
Hot query overload
Bad document causing indexing failure

Strategies

Return partial results
Retry shard query
Use replica shard
Fall back to simple ranking
Use cached results
Dead-letter queue for indexing failures
Rebuild index from source of truth

👉 Interview Answer

Search systems should degrade gracefully.

If one shard is unavailable, the system may return partial results instead of failing the whole query.

If the ranking service is down, we can fall back to BM25 or simple relevance scoring.

Since the document store is the source of truth, the index can be rebuilt if necessary.

1️⃣5️⃣ Consistency Model

Stronger Consistency Needed For

Source document storage
Access control / permission checks
Deleted or blocked content
Compliance-sensitive content removal

Eventual Consistency Acceptable For

Search index updates
Ranking feature updates
Popularity counters
Analytics
Autocomplete suggestions

👉 Interview Answer

The search index usually does not need strong consistency.

It is acceptable if a newly created document appears in search after a short delay.

However, deleted, blocked, or permission-sensitive content needs stronger correctness, so I would apply read-time permission checks before returning results.

1️⃣6️⃣ Security and Access Control

Why It Matters

Search may expose sensitive content.

Examples:

Private documents
Deleted posts
Blocked users
Region-restricted content
Enterprise permissions

Strategies

Index only searchable content
Store permission metadata in index
Apply filter during retrieval
Re-check permissions at read time
Remove deleted content quickly

👉 Interview Answer

Search must enforce access control carefully.

Even if permissions are indexed, I would still perform read-time permission checks for sensitive content.

This prevents stale index entries from exposing private, deleted, or blocked content.

1️⃣7️⃣ Observability

Key Metrics

Query latency p50 / p95 / p99
Query QPS
Indexing lag
Search error rate
Empty result rate
Cache hit rate
Ranking latency
Shard failure rate
Relevance metrics
Click-through rate

Logs

Track:

query_id
user_id
query_text
filters
shards_queried
latency
result_count
ranking_version

👉 Interview Answer

Observability is critical for search quality and reliability.

I would monitor query latency, indexing lag, empty result rate, shard failures, cache hit rate, and ranking latency.

I would also track relevance metrics such as click-through rate and successful search sessions.

1️⃣8️⃣ End-to-End Flow

Indexing Flow

Document created
→ Event published
→ Indexing worker parses document
→ Update inverted index
→ Update ranking features
→ Document becomes searchable

Search Flow

User submits query
→ Query understanding
→ Retrieve candidates from shards
→ Merge results
→ Rank / re-rank
→ Apply permission checks
→ Return results

Autocomplete Flow

User types prefix
→ Query autocomplete index
→ Rank suggestions by popularity
→ Return suggestions

Key Insight

Search is a pipeline system, not just a database query.

🧠 Staff-Level Answer (Final)

👉 Interview Answer (Full Version)

When designing a search system, I think of it as a pipeline of indexing, retrieval, ranking, and serving.

Documents are stored in a source-of-truth document store, while the search index is a read-optimized structure used for fast retrieval.

For query understanding, I would normalize the query, tokenize it, apply typo correction, synonym expansion, and potentially detect user intent.

For retrieval, I would use an inverted index for keyword search, and optionally vector search for semantic matching. In modern systems, hybrid retrieval is common, combining keyword search, vector search, and filters.

Ranking is usually multi-stage. First, we retrieve candidates using cheap signals like BM25. Then we apply more expensive ranking models, and finally re-rank results for diversity, freshness, safety, and business constraints.

For indexing, I would use an asynchronous near-real-time pipeline. Document updates publish events, indexing workers update the inverted index, and the system keeps indexing lag low.

To scale, I would shard indexes, replicate shards for availability, cache hot queries and posting lists, and use a coordinator to merge results from multiple shards.

The main trade-offs are latency, relevance, freshness, recall, precision, and cost.

Search index updates can usually be eventually consistent, but deleted, blocked, or permission-sensitive content needs stronger correctness through read-time checks.

Ultimately, the goal is to return relevant results quickly, while keeping the index fresh, scalable, and reliable.

⭐ Final Insight

Search System 的核心不是简单查数据库，而是在大规模数据上通过 indexing、retrieval 和 ranking 快速返回最相关的结果。

中文部分

🎯 Design Search System

1️⃣ 核心框架

在设计 Search System 时，我通常从以下几个方面来分析：

核心流程：写入 document、建立索引、执行搜索查询
Query understanding 和 query normalization
Retrieval 策略：关键词、过滤、向量搜索
Ranking 和个性化
Indexing 策略和新鲜度
扩展、分片、缓存和副本
核心权衡：延迟 vs 相关性 vs 新鲜度
故障处理和一致性

2️⃣ 核心需求

功能需求

用户可以搜索 documents / products / posts
支持关键词搜索
支持过滤和排序
支持 autocomplete
支持拼写容错
支持相关性排序
支持新内容快速可搜索
支持分页

非功能需求

搜索延迟低
高可用
高召回和高精度
可扩展的 indexing pipeline
近实时索引更新
支持高查询 QPS
部分故障时可以优雅降级

👉 面试回答

Search System 需要接收内容，建立可搜索索引，根据 query 检索候选结果，并按照相关性进行排序。

核心挑战是在延迟、相关性、新鲜度和扩展性之间做平衡。

搜索不是单一算法，而是由 indexing、retrieval、ranking 和 serving 组成的 pipeline。

3️⃣ 主要 API

Index Document

POST /api/index

Request:

{
  "documentId": "doc123",
  "title": "Distributed Systems Basics",
  "body": "This article explains replication and sharding.",
  "tags": ["system-design", "database"],
  "createdAt": "2026-05-02T10:00:00Z"
}

Search

GET /api/search?q=sharding&limit=20&cursor=xxx

Response:

{
  "results": [
    {
      "documentId": "doc123",
      "title": "Distributed Systems Basics",
      "snippet": "This article explains replication and sharding.",
      "score": 0.92
    }
  ],
  "nextCursor": "abc"
}

Autocomplete

GET /api/autocomplete?q=shar

👉 面试回答

我会提供 document indexing、search 和 autocomplete 相关 API。

Search API 需要针对低延迟优化，而 indexing API 通常可以异步化，用来提升写入吞吐和索引新鲜度。

4️⃣ 数据模型

Document Table

document (
  document_id VARCHAR PRIMARY KEY,
  title TEXT,
  body TEXT,
  author_id VARCHAR,
  created_at TIMESTAMP,
  updated_at TIMESTAMP,
  status VARCHAR,
  metadata JSON
)

Inverted Index

term → posting list

示例：

"sharding" → [
  { documentId: "doc123", frequency: 3, positions: [10, 42, 90] },
  { documentId: "doc456", frequency: 1, positions: [15] }
]

Forward Index

document_id → terms / metadata

用于：

Re-indexing
Highlighting
Debugging
Feature extraction

Ranking Feature Store

ranking_feature (
  document_id VARCHAR PRIMARY KEY,
  click_count BIGINT,
  view_count BIGINT,
  freshness_score DOUBLE,
  quality_score DOUBLE,
  updated_at TIMESTAMP
)

👉 面试回答

我会将原始 documents 和 search index 分开存储。

Document store 是 source of truth， inverted index 则是为检索优化的结构。

我也可能维护 forward index 和 ranking feature store，用于高亮、重建索引和排序特征计算。

5️⃣ Query Understanding

职责

Tokenization
Lowercasing
Stop word removal
Stemming / lemmatization
Synonym expansion
Typo correction
Intent understanding

示例

"best running shoes for winter"

可以理解为：

intent = product search
terms = running shoes, winter
filters = category: shoes
ranking boost = seasonal relevance

高级技术

Query rewriting
Spell correction
Synonym dictionary
Entity extraction
Language detection
Personalization context

👉 面试回答

Query understanding 会将用户输入的原始 query 转换成结构化搜索请求。

它可能包含分词、标准化、拼写纠错、同义词扩展和意图识别。

这一步很重要，因为如果 query 理解不好，即使后面的 ranking 很强，也可能召回不到正确结果。

6️⃣ Retrieval / Candidate Generation

目标

快速召回一个可管理的候选集合。

query → top N candidate documents

通常：

N = 100 to 10,000

Retrieval Methods

Keyword Retrieval

使用：

Inverted index
BM25
TF-IDF

适合：

精确关键词匹配
结构化搜索
高精度场景

Vector Retrieval

使用：

Embeddings
ANN index
HNSW / IVF / PQ

适合：

语义搜索
自然语言查询
类似同义词的匹配

Filter Retrieval

示例：

category = electronics
price < 100
created_at > last_30_days

Hybrid Retrieval

keyword retrieval + vector retrieval + filters

👉 面试回答

Retrieval 的目标是高效生成候选集合。

我会使用 inverted index 做关键词搜索，并根据需求加入 vector search 做语义匹配。

在现代系统中，hybrid retrieval 很常见，会结合关键词搜索、向量搜索和结构化过滤。

Retrieval 的目标是高召回和低延迟，因为后续 ranking 会负责精排。

7️⃣ Ranking

目标

按照相关性对候选结果排序。

Ranking Signals

文本相关性分数
Query-document match
Click-through rate
Freshness
Popularity
User personalization
Geographic relevance
Business rules
Content quality
Safety signals

Ranking Pipeline

Candidate Retrieval
→ Lightweight Scoring
→ ML Ranking
→ Re-ranking
→ Final Results

Multi-stage Ranking

Stage 1: First-pass Ranking

BM25 score
简单 freshness boost
低成本 features

Stage 2: ML Ranking

Gradient boosted trees
Neural ranking model
Learning-to-rank model

Stage 3: Re-ranking

目标：

Diversity
Freshness
Deduplication
Safety filtering
Business constraints

👉 面试回答

Ranking 决定搜索结果的最终顺序。

我会使用多阶段 ranking pipeline。第一阶段使用便宜的相关性信号，后续阶段使用更复杂的 ML model。

最后 re-ranking 可以用于保证多样性、新鲜度、安全性和业务约束。

8️⃣ Indexing Pipeline

基本流程

Document created / updated
→ Event published
→ Indexing worker consumes event
→ Parse and normalize document
→ Build index entries
→ Update inverted index
→ Update ranking features
→ Make document searchable

Batch Indexing

适用于：

大规模 backfill
离线重建索引
重新处理所有 documents

优点：

高效
容易优化

缺点：

新鲜度较低

Real-time Indexing

适用于：

新创建的 documents
对新鲜度要求高的内容
时间敏感搜索

优点：

新鲜度高

缺点：

更复杂
成本更高

Near-real-time Indexing

很多系统推荐使用。

small delay, usually seconds

👉 面试回答

我会使用异步 indexing pipeline。

当 document 创建或更新时， source service 会发布事件。

Indexing workers 消费事件，解析 document，更新 inverted index，并刷新 ranking features。

大多数生产系统使用 near-real-time indexing，在新鲜度和成本之间取得平衡。

9️⃣ Autocomplete

需求

用户输入时返回建议
极低延迟
支持热门 query
支持拼写容错

数据结构

Trie
Prefix index
N-gram index
Popular query cache

示例

Input: "shar"
Suggestions:
- sharding
- shared database
- shard key

👉 面试回答

Autocomplete 应该和完整搜索分开优化。

我会使用 prefix index 或 trie，并结合热门 query 统计。

因为 autocomplete 会在用户每次输入时触发，所以它必须极低延迟，并且高度缓存。

🔟 缓存策略

缓存什么？

热门 query results
热门 term 的 posting lists
Ranking features
Document metadata
Autocomplete suggestions
User personalization features

缓存层

Local cache
Redis / Memcached
CDN for public search pages
Search engine internal cache

缓存挑战

Freshness
Personalization
Filter combinations
Stale ranking features

👉 面试回答

缓存很重要，因为很多搜索 query 会重复出现。

我会缓存热门 query results、热门 posting lists、document metadata、 ranking features 和 autocomplete suggestions。

但是搜索结果缓存比较复杂，因为新鲜度、过滤条件和个性化都会改变结果集合。

1️⃣1️⃣ Sharding and Replication

为什么需要 Sharding？

Index 太大，单节点放不下
Query QPS 太高
需要水平扩展

Sharding Strategies

Document-based Sharding

document_id hash → shard

优点：

分布均匀
易于扩展

缺点：

Query 可能需要 fan out 到多个 shards

Term-based Sharding

term → shard

优点：

某些 query 可以命中更少 shards

缺点：

热门 term 容易导致 hot shards

Hybrid Sharding

大型系统常用。

Replication

用于：

高可用
读扩展
故障切换

👉 面试回答

我会水平分片 search index，常见方式是按 document ID 分片，因为这样分布更均匀。

Search query 可能需要 fan out 到多个 shards，每个 shard 返回 top K 结果。

Coordinator 再合并并进行全局排序。

Replication 用于提升可用性和读吞吐。

1️⃣2️⃣ Query Serving Flow

Flow

User query
→ API Gateway
→ Search Service
→ Query Understanding
→ Query Planner
→ Fan out to index shards
→ Retrieve candidates
→ Merge results
→ Rank / re-rank
→ Fetch document metadata
→ Return results

Shard Merge

每个 shard 返回：

top K local results

Coordinator 合并：

global top K results

👉 面试回答

在 query serving 中， search service 先理解 query，然后将请求 fan out 到相关 index shards。

每个 shard 返回本地 top results。

Coordinator 合并这些结果，执行 ranking 或 re-ranking，获取 metadata，最后返回最终结果。

1️⃣3️⃣ 核心权衡

Latency vs Relevance

更多 ranking features 可以提高相关性
但会增加延迟

Freshness vs Cost

实时 indexing 提高新鲜度
但成本和复杂度更高

Recall vs Precision

更大的 candidate set 提高召回
但会让 ranking 更慢

Personalization vs Cacheability

个性化结果提高相关性
但降低 cache hit rate

Consistency vs Availability

Search results 可以最终一致
Query serving 应该保持高可用

👉 面试回答

核心权衡包括延迟、相关性、新鲜度、召回、精度和成本。

更大的候选集合和更强的 ranking model 可以提升相关性，但会增加延迟。

实时 indexing 可以提高新鲜度，但会增加系统复杂度。

对大多数搜索系统来说， index updates 可以接受最终一致。

1️⃣4️⃣ 故障处理

常见故障

Index shard unavailable
Indexing pipeline delayed
Ranking service down
Cache unavailable
Stale index
Hot query overload
Bad document causing indexing failure

处理策略

返回 partial results
Retry shard query
使用 replica shard
回退到 simple ranking
使用 cached results
Indexing failures 进入 dead-letter queue
从 source of truth 重建 index

👉 面试回答

Search system 应该支持优雅降级。

如果某个 shard 不可用，系统可以返回 partial results，而不是让整个 query 失败。

如果 ranking service 故障，可以回退到 BM25 或简单相关性排序。

因为 document store 是 source of truth，所以 index 在必要时可以重建。

1️⃣5️⃣ 一致性模型

需要较强一致性的场景

Source document storage
Access control / permission checks
Deleted or blocked content
Compliance-sensitive content removal

可以最终一致的场景

Search index updates
Ranking feature updates
Popularity counters
Analytics
Autocomplete suggestions

👉 面试回答

Search index 通常不需要强一致。

新创建的 document 晚几秒出现在 search results 中通常是可以接受的。

但是 deleted、blocked 或 permission-sensitive content 需要更强正确性，所以我会在返回结果前进行 read-time permission checks。

1️⃣6️⃣ 安全和访问控制

为什么重要？

Search 可能暴露敏感内容。

例如：

Private documents
Deleted posts
Blocked users
Region-restricted content
Enterprise permissions

策略

只索引可搜索内容
在 index 中存储 permission metadata
Retrieval 时应用 permission filter
Read time 重新检查权限
快速移除 deleted content

👉 面试回答

Search 必须谨慎执行 access control。

即使 permissions 已经写入 index，对敏感内容我仍然会做 read-time permission check。

这样可以防止 stale index entries 暴露 private、 deleted 或 blocked content。

1️⃣7️⃣ 可观测性

Key Metrics

Query latency p50 / p95 / p99
Query QPS
Indexing lag
Search error rate
Empty result rate
Cache hit rate
Ranking latency
Shard failure rate
Relevance metrics
Click-through rate

Logs

追踪：

query_id
user_id
query_text
filters
shards_queried
latency
result_count
ranking_version

👉 面试回答

可观测性对搜索质量和可靠性非常重要。

我会监控 query latency、indexing lag、 empty result rate、shard failures、 cache hit rate 和 ranking latency。

同时也会追踪 click-through rate 和 successful search sessions 等相关性指标。

1️⃣8️⃣ End-to-End Flow

Indexing Flow

Document created
→ Event published
→ Indexing worker parses document
→ Update inverted index
→ Update ranking features
→ Document becomes searchable

Search Flow

User submits query
→ Query understanding
→ Retrieve candidates from shards
→ Merge results
→ Rank / re-rank
→ Apply permission checks
→ Return results

Autocomplete Flow

User types prefix
→ Query autocomplete index
→ Rank suggestions by popularity
→ Return suggestions

Key Insight

Search 是一个 pipeline system，不是简单的数据库查询。

🧠 Staff-Level Answer（最终版）

👉 面试回答（完整背诵版）

在设计 Search System 时，我会把它看作一个由 indexing、retrieval、ranking 和 serving 组成的 pipeline。

Documents 会存储在 source-of-truth document store 中， search index 则是为了快速检索而设计的读优化结构。

对于 query understanding，我会对 query 做标准化、分词、拼写纠错、同义词扩展，并根据场景识别用户意图。

对于 retrieval，我会使用 inverted index 处理关键词搜索，并根据需求加入 vector search 做语义匹配。在现代系统中，hybrid retrieval 很常见，会结合关键词搜索、向量搜索和 filters。

Ranking 通常是多阶段的。第一阶段使用 BM25 等低成本信号召回 candidates，然后使用更复杂的 ranking model，最后通过 re-ranking 处理多样性、新鲜度、安全性和业务约束。

对于 indexing，我会使用异步 near-real-time pipeline。 Document updates 会发布事件， indexing workers 更新 inverted index，并控制 indexing lag。

为了扩展，我会对 index 分片，使用副本提升可用性，缓存热门 query 和 posting lists，并使用 coordinator 合并多个 shards 的结果。

核心权衡包括延迟、相关性、新鲜度、召回、精度和成本。

Search index updates 通常可以最终一致，但 deleted、blocked 或 permission-sensitive content 需要通过 read-time checks 保证正确性。

最终目标是在保持 index 新鲜、可扩展和可靠的同时，快速返回最相关的搜索结果。

⭐ Final Insight

Search System 的核心不是简单查数据库，而是在大规模数据上通过 indexing、retrieval 和 ranking 快速返回最相关的结果。