🎯 Design Search System
1️⃣ Core Framework
When discussing Search System design, I frame it as:
- Core flows: ingest document, index document, search query
- Query understanding and normalization
- Retrieval strategy: keyword, filters, vector search
- Ranking and personalization
- Indexing strategy and freshness
- Scaling, sharding, caching, and replication
- Trade-offs: latency vs relevance vs freshness
- Failure handling and consistency
2️⃣ Core Requirements
Functional Requirements
- User can search documents / products / posts
- Support keyword search
- Support filters and sorting
- Support autocomplete
- Support typo tolerance
- Support ranking by relevance
- Support freshness for newly created content
- Support pagination
Non-functional Requirements
- Low search latency
- High availability
- High recall and precision
- Scalable indexing pipeline
- Near-real-time index updates
- Support high query QPS
- Graceful degradation during partial failures
👉 Interview Answer
A search system needs to ingest content, build searchable indexes, retrieve candidate results, and rank them by relevance.
The main challenge is balancing latency, relevance, freshness, and scalability.
Search is not a single algorithm; it is a pipeline of indexing, retrieval, ranking, and serving.
3️⃣ Main APIs
Index Document
POST /api/index
Request:
{
"documentId": "doc123",
"title": "Distributed Systems Basics",
"body": "This article explains replication and sharding.",
"tags": ["system-design", "database"],
"createdAt": "2026-05-02T10:00:00Z"
}
Search
GET /api/search?q=sharding&limit=20&cursor=xxx
Response:
{
"results": [
{
"documentId": "doc123",
"title": "Distributed Systems Basics",
"snippet": "This article explains replication and sharding.",
"score": 0.92
}
],
"nextCursor": "abc"
}
Autocomplete
GET /api/autocomplete?q=shar
👉 Interview Answer
I would expose APIs for indexing documents, searching documents, and supporting autocomplete.
Search APIs should be optimized for low latency, while indexing APIs can usually be asynchronous to improve write throughput and freshness.
4️⃣ Data Model
Document Table
document (
document_id VARCHAR PRIMARY KEY,
title TEXT,
body TEXT,
author_id VARCHAR,
created_at TIMESTAMP,
updated_at TIMESTAMP,
status VARCHAR,
metadata JSON
)
Inverted Index
term → posting list
Example:
"sharding" → [
{ documentId: "doc123", frequency: 3, positions: [10, 42, 90] },
{ documentId: "doc456", frequency: 1, positions: [15] }
]
Forward Index
document_id → terms / metadata
Used for:
- Re-indexing
- Highlighting
- Debugging
- Feature extraction
Ranking Feature Store
ranking_feature (
document_id VARCHAR PRIMARY KEY,
click_count BIGINT,
view_count BIGINT,
freshness_score DOUBLE,
quality_score DOUBLE,
updated_at TIMESTAMP
)
👉 Interview Answer
I would store the original documents separately from the search index.
The document store is the source of truth, while the inverted index is optimized for retrieval.
I may also maintain a forward index and ranking feature store to support highlighting, re-indexing, and ranking.
5️⃣ Query Understanding
Responsibilities
- Tokenization
- Lowercasing
- Stop word removal
- Stemming / lemmatization
- Synonym expansion
- Typo correction
- Intent understanding
Example
"best running shoes for winter"
Can be understood as:
intent = product search
terms = running shoes, winter
filters = category: shoes
ranking boost = seasonal relevance
Advanced Techniques
- Query rewriting
- Spell correction
- Synonym dictionary
- Entity extraction
- Language detection
- Personalization context
👉 Interview Answer
Query understanding transforms raw user input into a structured search request.
It may include tokenization, normalization, spelling correction, synonym expansion, and intent detection.
This step is important because poor query understanding leads to poor retrieval even if the ranking model is strong.
6️⃣ Retrieval / Candidate Generation
Goal
Retrieve a manageable candidate set quickly.
query → top N candidate documents
Usually:
N = 100 to 10,000
Retrieval Methods
Keyword Retrieval
Uses:
- Inverted index
- BM25
- TF-IDF
Good for:
- Exact keyword matching
- Structured search
- High precision
Vector Retrieval
Uses:
- Embeddings
- ANN index
- HNSW / IVF / PQ
Good for:
- Semantic search
- Natural language queries
- Synonym-like matching
Filter Retrieval
Examples:
category = electronics
price < 100
created_at > last_30_days
Hybrid Retrieval
keyword retrieval + vector retrieval + filters
👉 Interview Answer
Retrieval focuses on generating a candidate set efficiently.
I would use an inverted index for keyword search, and optionally vector search for semantic matching.
In modern systems, hybrid retrieval is common, combining keyword search, vector search, and structured filters.
The goal is high recall with low latency, because ranking will refine the final ordering.
7️⃣ Ranking
Goal
Order candidates by relevance.
Ranking Signals
- Text relevance score
- Query-document match
- Click-through rate
- Freshness
- Popularity
- User personalization
- Geographic relevance
- Business rules
- Content quality
- Safety signals
Ranking Pipeline
Candidate Retrieval
→ Lightweight Scoring
→ ML Ranking
→ Re-ranking
→ Final Results
Multi-stage Ranking
Stage 1: First-pass Ranking
- BM25 score
- Simple freshness boost
- Cheap features
Stage 2: ML Ranking
- Gradient boosted trees
- Neural ranking model
- Learning-to-rank model
Stage 3: Re-ranking
Goals:
- Diversity
- Freshness
- Deduplication
- Safety filtering
- Business constraints
👉 Interview Answer
Ranking determines the final ordering of search results.
I would use a multi-stage ranking pipeline. The first stage uses cheap relevance signals, while later stages use more expensive ML models.
Finally, re-ranking can enforce diversity, safety, freshness, and business constraints.
8️⃣ Indexing Pipeline
Basic Flow
Document created / updated
→ Event published
→ Indexing worker consumes event
→ Parse and normalize document
→ Build index entries
→ Update inverted index
→ Update ranking features
→ Make document searchable
Batch Indexing
Used for:
- Large backfills
- Offline rebuilds
- Reprocessing all documents
Pros:
- Efficient
- Easy to optimize
Cons:
- Less fresh
Real-time Indexing
Used for:
- Newly created documents
- Fresh content
- Time-sensitive search
Pros:
- High freshness
Cons:
- More complex
- More expensive
Near-real-time Indexing
Recommended for many systems.
small delay, usually seconds
👉 Interview Answer
I would use an asynchronous indexing pipeline.
When a document is created or updated, the source service publishes an event.
Indexing workers consume the event, parse the document, update the inverted index, and refresh ranking features.
Most production systems use near-real-time indexing to balance freshness and cost.
9️⃣ Autocomplete
Requirements
- Return suggestions as user types
- Very low latency
- Handle popular queries
- Handle typo tolerance
Data Structures
- Trie
- Prefix index
- N-gram index
- Popular query cache
Example
Input: "shar"
Suggestions:
- sharding
- shared database
- shard key
👉 Interview Answer
Autocomplete should be optimized separately from full search.
I would use a prefix index or trie, combined with popular query statistics.
Since autocomplete is called on every keystroke, it must be extremely low latency and heavily cached.
🔟 Caching Strategy
What to Cache?
- Popular query results
- Posting lists for hot terms
- Ranking features
- Document metadata
- Autocomplete suggestions
- User personalization features
Cache Layers
- Local cache
- Redis / Memcached
- CDN for public search pages
- Search engine internal cache
Cache Challenges
- Freshness
- Personalization
- Filter combinations
- Stale ranking features
👉 Interview Answer
Caching is important because many search queries are repeated.
I would cache popular query results, hot posting lists, document metadata, ranking features, and autocomplete suggestions.
However, caching search results is tricky because freshness, filters, and personalization can change the result set.
1️⃣1️⃣ Sharding and Replication
Why Shard?
- Index may be too large for one node
- Query QPS may be too high
- Need horizontal scale
Sharding Strategies
Document-based Sharding
document_id hash → shard
Pros:
- Even distribution
- Easy to scale
Cons:
- Query must fan out to many shards
Term-based Sharding
term → shard
Pros:
- Query can hit fewer shards for some terms
Cons:
- Hot terms can create hot shards
Hybrid Sharding
Used in larger systems.
Replication
Used for:
- High availability
- Read scaling
- Failover
👉 Interview Answer
I would shard the index horizontally, commonly by document ID, because it gives a balanced distribution.
Search queries may fan out to multiple shards, and each shard returns top K results.
The coordinator then merges and ranks results globally.
Replication improves availability and read throughput.
1️⃣2️⃣ Query Serving Flow
Flow
User query
→ API Gateway
→ Search Service
→ Query Understanding
→ Query Planner
→ Fan out to index shards
→ Retrieve candidates
→ Merge results
→ Rank / re-rank
→ Fetch document metadata
→ Return results
Shard Merge
Each shard returns:
top K local results
Coordinator merges:
global top K results
👉 Interview Answer
For query serving, the search service first understands the query, then fans out the request to relevant index shards.
Each shard returns local top results.
A coordinator merges those results, applies ranking or re-ranking, fetches metadata, and returns the final response.
1️⃣3️⃣ Trade-offs
Latency vs Relevance
- More ranking features improve relevance
- But increase latency
Freshness vs Cost
- Real-time indexing improves freshness
- But costs more and increases complexity
Recall vs Precision
- Large candidate set improves recall
- But makes ranking slower
Personalization vs Cacheability
- Personalized results improve relevance
- But reduce cache hit rate
Consistency vs Availability
- Search results can be eventually consistent
- Query serving should remain highly available
👉 Interview Answer
The main trade-offs are latency, relevance, freshness, recall, precision, and cost.
Larger candidate sets and stronger ranking models improve relevance, but they increase latency.
Real-time indexing improves freshness, but increases system complexity.
In most search systems, eventual consistency is acceptable for index updates.
1️⃣4️⃣ Failure Handling
Common Failures
- Index shard unavailable
- Indexing pipeline delayed
- Ranking service down
- Cache unavailable
- Stale index
- Hot query overload
- Bad document causing indexing failure
Strategies
- Return partial results
- Retry shard query
- Use replica shard
- Fall back to simple ranking
- Use cached results
- Dead-letter queue for indexing failures
- Rebuild index from source of truth
👉 Interview Answer
Search systems should degrade gracefully.
If one shard is unavailable, the system may return partial results instead of failing the whole query.
If the ranking service is down, we can fall back to BM25 or simple relevance scoring.
Since the document store is the source of truth, the index can be rebuilt if necessary.
1️⃣5️⃣ Consistency Model
Stronger Consistency Needed For
- Source document storage
- Access control / permission checks
- Deleted or blocked content
- Compliance-sensitive content removal
Eventual Consistency Acceptable For
- Search index updates
- Ranking feature updates
- Popularity counters
- Analytics
- Autocomplete suggestions
👉 Interview Answer
The search index usually does not need strong consistency.
It is acceptable if a newly created document appears in search after a short delay.
However, deleted, blocked, or permission-sensitive content needs stronger correctness, so I would apply read-time permission checks before returning results.
1️⃣6️⃣ Security and Access Control
Why It Matters
Search may expose sensitive content.
Examples:
- Private documents
- Deleted posts
- Blocked users
- Region-restricted content
- Enterprise permissions
Strategies
- Index only searchable content
- Store permission metadata in index
- Apply filter during retrieval
- Re-check permissions at read time
- Remove deleted content quickly
👉 Interview Answer
Search must enforce access control carefully.
Even if permissions are indexed, I would still perform read-time permission checks for sensitive content.
This prevents stale index entries from exposing private, deleted, or blocked content.
1️⃣7️⃣ Observability
Key Metrics
- Query latency p50 / p95 / p99
- Query QPS
- Indexing lag
- Search error rate
- Empty result rate
- Cache hit rate
- Ranking latency
- Shard failure rate
- Relevance metrics
- Click-through rate
Logs
Track:
query_id
user_id
query_text
filters
shards_queried
latency
result_count
ranking_version
👉 Interview Answer
Observability is critical for search quality and reliability.
I would monitor query latency, indexing lag, empty result rate, shard failures, cache hit rate, and ranking latency.
I would also track relevance metrics such as click-through rate and successful search sessions.
1️⃣8️⃣ End-to-End Flow
Indexing Flow
Document created
→ Event published
→ Indexing worker parses document
→ Update inverted index
→ Update ranking features
→ Document becomes searchable
Search Flow
User submits query
→ Query understanding
→ Retrieve candidates from shards
→ Merge results
→ Rank / re-rank
→ Apply permission checks
→ Return results
Autocomplete Flow
User types prefix
→ Query autocomplete index
→ Rank suggestions by popularity
→ Return suggestions
Key Insight
Search is a pipeline system, not just a database query.
🧠 Staff-Level Answer (Final)
👉 Interview Answer (Full Version)
When designing a search system, I think of it as a pipeline of indexing, retrieval, ranking, and serving.
Documents are stored in a source-of-truth document store, while the search index is a read-optimized structure used for fast retrieval.
For query understanding, I would normalize the query, tokenize it, apply typo correction, synonym expansion, and potentially detect user intent.
For retrieval, I would use an inverted index for keyword search, and optionally vector search for semantic matching. In modern systems, hybrid retrieval is common, combining keyword search, vector search, and filters.
Ranking is usually multi-stage. First, we retrieve candidates using cheap signals like BM25. Then we apply more expensive ranking models, and finally re-rank results for diversity, freshness, safety, and business constraints.
For indexing, I would use an asynchronous near-real-time pipeline. Document updates publish events, indexing workers update the inverted index, and the system keeps indexing lag low.
To scale, I would shard indexes, replicate shards for availability, cache hot queries and posting lists, and use a coordinator to merge results from multiple shards.
The main trade-offs are latency, relevance, freshness, recall, precision, and cost.
Search index updates can usually be eventually consistent, but deleted, blocked, or permission-sensitive content needs stronger correctness through read-time checks.
Ultimately, the goal is to return relevant results quickly, while keeping the index fresh, scalable, and reliable.
⭐ Final Insight
Search System 的核心不是简单查数据库, 而是在大规模数据上通过 indexing、retrieval 和 ranking 快速返回最相关的结果。
中文部分
🎯 Design Search System
1️⃣ 核心框架
在设计 Search System 时,我通常从以下几个方面来分析:
- 核心流程:写入 document、建立索引、执行搜索查询
- Query understanding 和 query normalization
- Retrieval 策略:关键词、过滤、向量搜索
- Ranking 和个性化
- Indexing 策略和新鲜度
- 扩展、分片、缓存和副本
- 核心权衡:延迟 vs 相关性 vs 新鲜度
- 故障处理和一致性
2️⃣ 核心需求
功能需求
- 用户可以搜索 documents / products / posts
- 支持关键词搜索
- 支持过滤和排序
- 支持 autocomplete
- 支持拼写容错
- 支持相关性排序
- 支持新内容快速可搜索
- 支持分页
非功能需求
- 搜索延迟低
- 高可用
- 高召回和高精度
- 可扩展的 indexing pipeline
- 近实时索引更新
- 支持高查询 QPS
- 部分故障时可以优雅降级
👉 面试回答
Search System 需要接收内容, 建立可搜索索引, 根据 query 检索候选结果, 并按照相关性进行排序。
核心挑战是在延迟、相关性、新鲜度和扩展性之间做平衡。
搜索不是单一算法, 而是由 indexing、retrieval、ranking 和 serving 组成的 pipeline。
3️⃣ 主要 API
Index Document
POST /api/index
Request:
{
"documentId": "doc123",
"title": "Distributed Systems Basics",
"body": "This article explains replication and sharding.",
"tags": ["system-design", "database"],
"createdAt": "2026-05-02T10:00:00Z"
}
Search
GET /api/search?q=sharding&limit=20&cursor=xxx
Response:
{
"results": [
{
"documentId": "doc123",
"title": "Distributed Systems Basics",
"snippet": "This article explains replication and sharding.",
"score": 0.92
}
],
"nextCursor": "abc"
}
Autocomplete
GET /api/autocomplete?q=shar
👉 面试回答
我会提供 document indexing、search 和 autocomplete 相关 API。
Search API 需要针对低延迟优化, 而 indexing API 通常可以异步化, 用来提升写入吞吐和索引新鲜度。
4️⃣ 数据模型
Document Table
document (
document_id VARCHAR PRIMARY KEY,
title TEXT,
body TEXT,
author_id VARCHAR,
created_at TIMESTAMP,
updated_at TIMESTAMP,
status VARCHAR,
metadata JSON
)
Inverted Index
term → posting list
示例:
"sharding" → [
{ documentId: "doc123", frequency: 3, positions: [10, 42, 90] },
{ documentId: "doc456", frequency: 1, positions: [15] }
]
Forward Index
document_id → terms / metadata
用于:
- Re-indexing
- Highlighting
- Debugging
- Feature extraction
Ranking Feature Store
ranking_feature (
document_id VARCHAR PRIMARY KEY,
click_count BIGINT,
view_count BIGINT,
freshness_score DOUBLE,
quality_score DOUBLE,
updated_at TIMESTAMP
)
👉 面试回答
我会将原始 documents 和 search index 分开存储。
Document store 是 source of truth, inverted index 则是为检索优化的结构。
我也可能维护 forward index 和 ranking feature store, 用于高亮、重建索引和排序特征计算。
5️⃣ Query Understanding
职责
- Tokenization
- Lowercasing
- Stop word removal
- Stemming / lemmatization
- Synonym expansion
- Typo correction
- Intent understanding
示例
"best running shoes for winter"
可以理解为:
intent = product search
terms = running shoes, winter
filters = category: shoes
ranking boost = seasonal relevance
高级技术
- Query rewriting
- Spell correction
- Synonym dictionary
- Entity extraction
- Language detection
- Personalization context
👉 面试回答
Query understanding 会将用户输入的原始 query 转换成结构化搜索请求。
它可能包含分词、标准化、拼写纠错、 同义词扩展和意图识别。
这一步很重要, 因为如果 query 理解不好, 即使后面的 ranking 很强, 也可能召回不到正确结果。
6️⃣ Retrieval / Candidate Generation
目标
快速召回一个可管理的候选集合。
query → top N candidate documents
通常:
N = 100 to 10,000
Retrieval Methods
Keyword Retrieval
使用:
- Inverted index
- BM25
- TF-IDF
适合:
- 精确关键词匹配
- 结构化搜索
- 高精度场景
Vector Retrieval
使用:
- Embeddings
- ANN index
- HNSW / IVF / PQ
适合:
- 语义搜索
- 自然语言查询
- 类似同义词的匹配
Filter Retrieval
示例:
category = electronics
price < 100
created_at > last_30_days
Hybrid Retrieval
keyword retrieval + vector retrieval + filters
👉 面试回答
Retrieval 的目标是高效生成候选集合。
我会使用 inverted index 做关键词搜索, 并根据需求加入 vector search 做语义匹配。
在现代系统中,hybrid retrieval 很常见, 会结合关键词搜索、向量搜索和结构化过滤。
Retrieval 的目标是高召回和低延迟, 因为后续 ranking 会负责精排。
7️⃣ Ranking
目标
按照相关性对候选结果排序。
Ranking Signals
- 文本相关性分数
- Query-document match
- Click-through rate
- Freshness
- Popularity
- User personalization
- Geographic relevance
- Business rules
- Content quality
- Safety signals
Ranking Pipeline
Candidate Retrieval
→ Lightweight Scoring
→ ML Ranking
→ Re-ranking
→ Final Results
Multi-stage Ranking
Stage 1: First-pass Ranking
- BM25 score
- 简单 freshness boost
- 低成本 features
Stage 2: ML Ranking
- Gradient boosted trees
- Neural ranking model
- Learning-to-rank model
Stage 3: Re-ranking
目标:
- Diversity
- Freshness
- Deduplication
- Safety filtering
- Business constraints
👉 面试回答
Ranking 决定搜索结果的最终顺序。
我会使用多阶段 ranking pipeline。 第一阶段使用便宜的相关性信号, 后续阶段使用更复杂的 ML model。
最后 re-ranking 可以用于保证多样性、 新鲜度、安全性和业务约束。
8️⃣ Indexing Pipeline
基本流程
Document created / updated
→ Event published
→ Indexing worker consumes event
→ Parse and normalize document
→ Build index entries
→ Update inverted index
→ Update ranking features
→ Make document searchable
Batch Indexing
适用于:
- 大规模 backfill
- 离线重建索引
- 重新处理所有 documents
优点:
- 高效
- 容易优化
缺点:
- 新鲜度较低
Real-time Indexing
适用于:
- 新创建的 documents
- 对新鲜度要求高的内容
- 时间敏感搜索
优点:
- 新鲜度高
缺点:
- 更复杂
- 成本更高
Near-real-time Indexing
很多系统推荐使用。
small delay, usually seconds
👉 面试回答
我会使用异步 indexing pipeline。
当 document 创建或更新时, source service 会发布事件。
Indexing workers 消费事件, 解析 document, 更新 inverted index, 并刷新 ranking features。
大多数生产系统使用 near-real-time indexing, 在新鲜度和成本之间取得平衡。
9️⃣ Autocomplete
需求
- 用户输入时返回建议
- 极低延迟
- 支持热门 query
- 支持拼写容错
数据结构
- Trie
- Prefix index
- N-gram index
- Popular query cache
示例
Input: "shar"
Suggestions:
- sharding
- shared database
- shard key
👉 面试回答
Autocomplete 应该和完整搜索分开优化。
我会使用 prefix index 或 trie, 并结合热门 query 统计。
因为 autocomplete 会在用户每次输入时触发, 所以它必须极低延迟,并且高度缓存。
🔟 缓存策略
缓存什么?
- 热门 query results
- 热门 term 的 posting lists
- Ranking features
- Document metadata
- Autocomplete suggestions
- User personalization features
缓存层
- Local cache
- Redis / Memcached
- CDN for public search pages
- Search engine internal cache
缓存挑战
- Freshness
- Personalization
- Filter combinations
- Stale ranking features
👉 面试回答
缓存很重要, 因为很多搜索 query 会重复出现。
我会缓存热门 query results、 热门 posting lists、document metadata、 ranking features 和 autocomplete suggestions。
但是搜索结果缓存比较复杂, 因为新鲜度、过滤条件和个性化都会改变结果集合。
1️⃣1️⃣ Sharding and Replication
为什么需要 Sharding?
- Index 太大,单节点放不下
- Query QPS 太高
- 需要水平扩展
Sharding Strategies
Document-based Sharding
document_id hash → shard
优点:
- 分布均匀
- 易于扩展
缺点:
- Query 可能需要 fan out 到多个 shards
Term-based Sharding
term → shard
优点:
- 某些 query 可以命中更少 shards
缺点:
- 热门 term 容易导致 hot shards
Hybrid Sharding
大型系统常用。
Replication
用于:
- 高可用
- 读扩展
- 故障切换
👉 面试回答
我会水平分片 search index, 常见方式是按 document ID 分片, 因为这样分布更均匀。
Search query 可能需要 fan out 到多个 shards, 每个 shard 返回 top K 结果。
Coordinator 再合并并进行全局排序。
Replication 用于提升可用性和读吞吐。
1️⃣2️⃣ Query Serving Flow
Flow
User query
→ API Gateway
→ Search Service
→ Query Understanding
→ Query Planner
→ Fan out to index shards
→ Retrieve candidates
→ Merge results
→ Rank / re-rank
→ Fetch document metadata
→ Return results
Shard Merge
每个 shard 返回:
top K local results
Coordinator 合并:
global top K results
👉 面试回答
在 query serving 中, search service 先理解 query, 然后将请求 fan out 到相关 index shards。
每个 shard 返回本地 top results。
Coordinator 合并这些结果, 执行 ranking 或 re-ranking, 获取 metadata, 最后返回最终结果。
1️⃣3️⃣ 核心权衡
Latency vs Relevance
- 更多 ranking features 可以提高相关性
- 但会增加延迟
Freshness vs Cost
- 实时 indexing 提高新鲜度
- 但成本和复杂度更高
Recall vs Precision
- 更大的 candidate set 提高召回
- 但会让 ranking 更慢
Personalization vs Cacheability
- 个性化结果提高相关性
- 但降低 cache hit rate
Consistency vs Availability
- Search results 可以最终一致
- Query serving 应该保持高可用
👉 面试回答
核心权衡包括延迟、相关性、新鲜度、 召回、精度和成本。
更大的候选集合和更强的 ranking model 可以提升相关性, 但会增加延迟。
实时 indexing 可以提高新鲜度, 但会增加系统复杂度。
对大多数搜索系统来说, index updates 可以接受最终一致。
1️⃣4️⃣ 故障处理
常见故障
- Index shard unavailable
- Indexing pipeline delayed
- Ranking service down
- Cache unavailable
- Stale index
- Hot query overload
- Bad document causing indexing failure
处理策略
- 返回 partial results
- Retry shard query
- 使用 replica shard
- 回退到 simple ranking
- 使用 cached results
- Indexing failures 进入 dead-letter queue
- 从 source of truth 重建 index
👉 面试回答
Search system 应该支持优雅降级。
如果某个 shard 不可用, 系统可以返回 partial results, 而不是让整个 query 失败。
如果 ranking service 故障, 可以回退到 BM25 或简单相关性排序。
因为 document store 是 source of truth, 所以 index 在必要时可以重建。
1️⃣5️⃣ 一致性模型
需要较强一致性的场景
- Source document storage
- Access control / permission checks
- Deleted or blocked content
- Compliance-sensitive content removal
可以最终一致的场景
- Search index updates
- Ranking feature updates
- Popularity counters
- Analytics
- Autocomplete suggestions
👉 面试回答
Search index 通常不需要强一致。
新创建的 document 晚几秒出现在 search results 中 通常是可以接受的。
但是 deleted、blocked 或 permission-sensitive content 需要更强正确性, 所以我会在返回结果前进行 read-time permission checks。
1️⃣6️⃣ 安全和访问控制
为什么重要?
Search 可能暴露敏感内容。
例如:
- Private documents
- Deleted posts
- Blocked users
- Region-restricted content
- Enterprise permissions
策略
- 只索引可搜索内容
- 在 index 中存储 permission metadata
- Retrieval 时应用 permission filter
- Read time 重新检查权限
- 快速移除 deleted content
👉 面试回答
Search 必须谨慎执行 access control。
即使 permissions 已经写入 index, 对敏感内容我仍然会做 read-time permission check。
这样可以防止 stale index entries 暴露 private、 deleted 或 blocked content。
1️⃣7️⃣ 可观测性
Key Metrics
- Query latency p50 / p95 / p99
- Query QPS
- Indexing lag
- Search error rate
- Empty result rate
- Cache hit rate
- Ranking latency
- Shard failure rate
- Relevance metrics
- Click-through rate
Logs
追踪:
query_id
user_id
query_text
filters
shards_queried
latency
result_count
ranking_version
👉 面试回答
可观测性对搜索质量和可靠性非常重要。
我会监控 query latency、indexing lag、 empty result rate、shard failures、 cache hit rate 和 ranking latency。
同时也会追踪 click-through rate 和 successful search sessions 等相关性指标。
1️⃣8️⃣ End-to-End Flow
Indexing Flow
Document created
→ Event published
→ Indexing worker parses document
→ Update inverted index
→ Update ranking features
→ Document becomes searchable
Search Flow
User submits query
→ Query understanding
→ Retrieve candidates from shards
→ Merge results
→ Rank / re-rank
→ Apply permission checks
→ Return results
Autocomplete Flow
User types prefix
→ Query autocomplete index
→ Rank suggestions by popularity
→ Return suggestions
Key Insight
Search 是一个 pipeline system, 不是简单的数据库查询。
🧠 Staff-Level Answer(最终版)
👉 面试回答(完整背诵版)
在设计 Search System 时, 我会把它看作一个由 indexing、retrieval、ranking 和 serving 组成的 pipeline。
Documents 会存储在 source-of-truth document store 中, search index 则是为了快速检索而设计的读优化结构。
对于 query understanding, 我会对 query 做标准化、分词、拼写纠错、 同义词扩展,并根据场景识别用户意图。
对于 retrieval, 我会使用 inverted index 处理关键词搜索, 并根据需求加入 vector search 做语义匹配。 在现代系统中,hybrid retrieval 很常见, 会结合关键词搜索、向量搜索和 filters。
Ranking 通常是多阶段的。 第一阶段使用 BM25 等低成本信号召回 candidates, 然后使用更复杂的 ranking model, 最后通过 re-ranking 处理多样性、新鲜度、 安全性和业务约束。
对于 indexing, 我会使用异步 near-real-time pipeline。 Document updates 会发布事件, indexing workers 更新 inverted index, 并控制 indexing lag。
为了扩展, 我会对 index 分片, 使用副本提升可用性, 缓存热门 query 和 posting lists, 并使用 coordinator 合并多个 shards 的结果。
核心权衡包括延迟、相关性、新鲜度、 召回、精度和成本。
Search index updates 通常可以最终一致, 但 deleted、blocked 或 permission-sensitive content 需要通过 read-time checks 保证正确性。
最终目标是在保持 index 新鲜、可扩展和可靠的同时, 快速返回最相关的搜索结果。
⭐ Final Insight
Search System 的核心不是简单查数据库, 而是在大规模数据上通过 indexing、retrieval 和 ranking 快速返回最相关的结果。
Implement