aaa-rag RAG & Knowledge Systems ·

🎯 Hybrid Search Systems: BM25 + Vector Search

1️⃣ Core Framework

When discussing Hybrid Search Systems, I frame it as:

Why hybrid search is needed
BM25 keyword search
Vector semantic search
Candidate generation
Score normalization and merging
Re-ranking
Metadata filtering and access control
Trade-offs: exactness vs semantic recall

2️⃣ What Is Hybrid Search?

Hybrid search combines keyword search and vector search.

Hybrid Search = BM25 Keyword Search + Vector Search

The goal is to retrieve both:

Exact matches
Semantic matches

Basic Flow

User Query
→ BM25 Search
→ Vector Search
→ Merge Candidates
→ Re-rank
→ Return Final Results

👉 Interview Answer

Hybrid search combines lexical search and semantic search.

BM25 keyword search finds exact terms, while vector search finds semantically similar content.

Together, they usually produce better retrieval quality than either method alone.

3️⃣ Why Hybrid Search Is Needed

Keyword Search Alone Fails When

The user uses different wording from the document.

User asks:
"How do I get my money back?"

Document says:
"Refund requests must be submitted within 30 days."

Keyword search may miss it.

Vector Search Alone Fails When

The query contains exact identifiers.

Query:
"ERR_PAYMENT_TIMEOUT_504"

Vector search may not prioritize the exact error code.

Hybrid Fix

BM25 catches exact terms.
Vector search catches semantic meaning.

👉 Interview Answer

Hybrid search is useful because keyword search and vector search fail in different ways.

Keyword search is strong for exact terms, while vector search is strong for semantic similarity.

Combining them improves both precision and recall.

4️⃣ BM25 Keyword Search

What Is BM25?

BM25 is a ranking function used in keyword search.

It scores documents based on:

Term frequency
Inverse document frequency
Document length normalization

Strengths

BM25 is strong for:

Exact words
Error codes
Product names
API names
IDs
Acronyms
Legal phrases

Example

Query:
"POST /v1/payments 500 error"

BM25 finds documents containing those exact terms.

👉 Interview Answer

BM25 is a classic keyword ranking method.

It works well when exact terms matter, such as API names, IDs, error codes, product names, and known phrases.

It is fast, predictable, and easy to debug.

5️⃣ Vector Search

What Is Vector Search?

Vector search retrieves content by semantic similarity.

It uses embeddings.

Query
→ Embedding Model
→ Query Vector
→ Nearest Neighbor Search
→ Similar Chunks

Strengths

Vector search is strong for:

Natural language questions
Synonyms
Paraphrases
Conceptual similarity
Vague queries
Unknown exact wording

Example

User asks:
"Why can't customers complete checkout?"

Vector search may retrieve:
"Payment authorization failures during checkout."

👉 Interview Answer

Vector search retrieves semantically similar content using embeddings.

It is useful when users ask natural language questions or when the query and document use different wording.

6️⃣ High-Level Hybrid Architecture

Architecture

User Query
→ Query Preprocessor
→ BM25 Retriever
→ Vector Retriever
→ Candidate Merger
→ Score Normalizer
→ Re-ranker
→ Context Builder
→ LLM

Two Retrieval Paths

BM25 Path

Query
→ Inverted Index
→ Lexical Candidates

Vector Path

Query
→ Embedding Model
→ Vector Index
→ Semantic Candidates

Merge Path

Lexical Candidates
+ Semantic Candidates
→ Merge
→ Re-rank
→ Final Results

👉 Interview Answer

A hybrid search architecture usually runs lexical retrieval and vector retrieval in parallel.

The system then merges candidates, normalizes scores, applies metadata filters, re-ranks results, and sends the best chunks to the LLM.

7️⃣ Candidate Generation

Why Candidate Generation Matters

At scale, we cannot deeply rank every document.

So we first generate candidates.

Candidate Sources

BM25 returns top 100 lexical candidates.
Vector search returns top 100 semantic candidates.

Then the system merges them.

Example

BM25 top 100
+
Vector top 100
→ Merge into candidate set
→ Deduplicate
→ Re-rank

👉 Interview Answer

Hybrid search usually starts with candidate generation.

BM25 retrieves lexical candidates, vector search retrieves semantic candidates, and the system merges and deduplicates them before re-ranking.

8️⃣ Score Normalization

Why Scores Are Hard to Combine

BM25 scores and vector similarity scores are different.

BM25 score: 17.4
Vector cosine similarity: 0.82

They are not directly comparable.

Normalization Approaches

Min-max normalization
Z-score normalization
Rank-based fusion
Reciprocal Rank Fusion
Learned ranking model

Important Point

Do not simply add raw BM25 and vector scores.

👉 Interview Answer

BM25 scores and vector similarity scores are on different scales.

A production hybrid system must normalize or fuse scores carefully.

Rank-based fusion or learned ranking is often more reliable than adding raw scores.

9️⃣ Reciprocal Rank Fusion

What Is RRF?

Reciprocal Rank Fusion combines ranked lists based on result positions.

It does not require raw scores to be comparable.

Result score = sum of 1 / (k + rank)

Why It Works

RRF rewards documents that rank highly in either search method.

Example

Document A:
Rank 2 in BM25
Rank 8 in Vector Search

Document B:
Rank 20 in BM25
Rank 1 in Vector Search

Both can be strong candidates.

👉 Interview Answer

Reciprocal Rank Fusion is a common way to merge BM25 and vector search results.

It uses rank positions instead of raw scores, which avoids the problem of comparing BM25 scores directly with vector similarity scores.

🔟 Metadata Filtering

Why Filtering Matters

Hybrid search must still respect constraints.

Examples:

Permissions
Tenant
Language
Region
Product
Document type
Freshness
Source authority

Filter Flow

User Query
→ Permission Filter
→ BM25 + Vector Retrieval
→ Candidate Merge
→ Re-rank

Important Rule

Unauthorized chunks should never reach the LLM.

👉 Interview Answer

Metadata filtering is critical in hybrid search systems.

The system should enforce permissions, freshness, tenant, region, language, and document-type filters before context is sent to the LLM.

1️⃣1️⃣ Re-ranking

Why Re-ranking Is Needed

Initial retrieval is optimized for recall.

Re-ranking improves precision.

Re-ranking Flow

Merged Candidates
→ Cross-encoder / LLM Ranker / Learning-to-rank Model
→ Top Results

Benefits

Better relevance
Less noise
Better context quality
Lower hallucination risk

Trade-off

Re-ranking improves quality, but adds latency and cost.

👉 Interview Answer

Hybrid search often uses re-ranking after merging candidates.

BM25 and vector search generate broad candidates, while the re-ranker selects the most relevant chunks.

This improves precision but adds latency and cost.

1️⃣2️⃣ Hybrid Search in RAG

RAG Flow

User Question
→ Hybrid Retrieval
→ Re-rank Chunks
→ Build Context
→ LLM Answer with Citations

Why It Improves RAG

Hybrid search improves:

Exact match retrieval
Semantic recall
Source grounding
Citation quality
Answer accuracy

Example

Question:
"Why is GET /accounts failing with ERR_403?"

BM25 finds:
"GET /accounts"
"ERR_403"

Vector finds:
"authorization failure"
"permission denied"

Together they provide better context.

👉 Interview Answer

Hybrid search is especially useful for RAG because user queries often contain both natural language and exact terms.

BM25 captures exact identifiers, while vector search captures semantic context.

This improves answer grounding and citation quality.

1️⃣3️⃣ Performance and Cost

Cost Drivers

Hybrid search is more expensive than single-method search.

It may require:

BM25 retrieval
Query embedding generation
Vector retrieval
Candidate merging
Re-ranking

Optimization

Run BM25 and vector search in parallel
Limit candidate counts
Cache query embeddings
Cache popular retrieval results
Use lightweight re-rankers
Apply filters early

👉 Interview Answer

Hybrid search improves quality, but costs more than using only keyword or vector search.

Production systems optimize by parallelizing retrieval, limiting candidate counts, caching embeddings, applying filters early, and using lightweight re-rankers.

1️⃣4️⃣ Common Failure Modes

Failure Modes

Hybrid systems can fail because of:

Poor score normalization
Bad chunking
Weak embeddings
BM25 over-weighting exact terms
Vector search retrieving broad but irrelevant chunks
Duplicate candidates
Missing metadata filters
Re-ranker latency
Permission leaks

Example

BM25 finds exact error code,
but document is outdated.

Vector search finds newer semantic match,
but lower lexical score.

Bad fusion ranks old result first.

👉 Interview Answer

Hybrid search failure often happens during merging and ranking.

If lexical scores, semantic scores, freshness, and permissions are not handled correctly, the final ranking may be misleading.

1️⃣5️⃣ Design Framework

Use BM25 For

Exact terms
Error codes
IDs
API names
Legal phrases
Product names

Use Vector Search For

Natural language questions
Synonyms
Paraphrases
Conceptual meaning
Vague queries

Use Re-ranking For

Final precision
Context quality
Reducing noise
Selecting best chunks for LLM

Use Metadata For

Permissions
Freshness
Filtering
Source authority
Compliance

👉 Interview Answer

My design framework is: use BM25 for exact matching, vector search for semantic recall, metadata filters for constraints, and re-ranking for final precision.

1️⃣6️⃣ Best Practices

Practical Rules

Run BM25 and vector search in parallel
Do not add raw scores directly
Use rank fusion or learned ranking
Apply metadata filters early
Deduplicate candidates
Re-rank merged results
Track freshness and source quality
Log retrieval paths
Evaluate BM25, vector, and hybrid separately
Tune candidate counts experimentally

Design Principle

BM25 gives exactness.
Vector search gives meaning.
Hybrid search gives robustness.

👉 Interview Answer

A strong hybrid search system combines multiple retrieval signals.

BM25 provides exact matching, vector search provides semantic recall, metadata filters provide constraints, and re-ranking provides final precision.

🧠 Staff-Level Answer Final

👉 Interview Answer Full Version

Hybrid search combines BM25 keyword retrieval with vector semantic retrieval.

The reason this architecture is useful is that keyword search and vector search fail in different ways.

BM25 is strong for exact terms: IDs, error codes, API names, product names, legal phrases, and known keywords.

It is fast, predictable, and easy to debug.

But it struggles when the user asks a natural language question using different wording from the document.

Vector search solves that by using embeddings to retrieve semantically similar content.

It works well for paraphrases, synonyms, vague queries, and conceptual questions.

But vector search can be weak for exact identifiers, rare terms, acronyms, and error codes.

In a production hybrid system, the query usually goes through both paths: a BM25 retriever over an inverted index, and a vector retriever over an embedding index.

Each returns a candidate set.

The system then deduplicates candidates, applies metadata filters, normalizes or fuses rankings, and often runs a re-ranker to select the final top results.

Score merging is one of the hardest parts because BM25 scores and vector similarity scores are not directly comparable.

A common approach is rank-based fusion, such as Reciprocal Rank Fusion, or a learned ranking model.

For RAG systems, hybrid search is often the best default because user queries frequently contain both natural language intent and exact technical terms.

For example, a query may include an API endpoint, an error code, and a natural language description.

BM25 captures the endpoint and error code, while vector search captures the semantic meaning.

The final design should also include metadata filtering for permissions, freshness, document type, tenant, language, and source authority.

The core principle is: BM25 gives exactness, vector search gives meaning, and hybrid search gives robustness.

⭐ Final Insight

Hybrid Search 的核心不是简单地：

“BM25 + Vector 分数相加”

真正的 production hybrid search 是：

BM25 Candidate Generation

Vector Candidate Generation

Metadata Filtering

Score Fusion

Deduplication

Re-ranking

Access Control

Observability。

BM25 解决 exact match。

Vector Search 解决 semantic meaning。

Re-ranker 解决 final precision。

最重要的一句话：

BM25 gives exactness.

Vector search gives meaning.

Hybrid search gives robustness.

中文部分

🎯 Hybrid Search Systems: BM25 + Vector Search

1️⃣ 核心框架

讨论 Hybrid Search Systems 时，我通常从这些方面分析：

为什么需要 hybrid search
BM25 keyword search
Vector semantic search
Candidate generation
Score normalization and merging
Re-ranking
Metadata filtering and access control
核心权衡：exactness vs semantic recall

2️⃣ 什么是 Hybrid Search？

Hybrid search 结合 keyword search 和 vector search。

Hybrid Search = BM25 Keyword Search + Vector Search

目标是同时检索：

Exact matches
Semantic matches

Basic Flow

User Query
→ BM25 Search
→ Vector Search
→ Merge Candidates
→ Re-rank
→ Return Final Results

👉 面试回答

Hybrid search 结合 lexical search 和 semantic search。

BM25 keyword search 找 exact terms， vector search 找 semantically similar content。

两者结合，通常比单独使用其中一种 retrieval quality 更好。

3️⃣ 为什么需要 Hybrid Search？

Keyword Search Alone Fails When

用户使用的 wording 和 document 不一致。

User asks:
"How do I get my money back?"

Document says:
"Refund requests must be submitted within 30 days."

Keyword search 可能 miss。

Vector Search Alone Fails When

Query 包含 exact identifiers。

Query:
"ERR_PAYMENT_TIMEOUT_504"

Vector search 可能不会优先匹配 exact error code。

Hybrid Fix

BM25 catches exact terms.
Vector search catches semantic meaning.

👉 面试回答

Hybrid search 有用，因为 keyword search 和 vector search 会以不同方式失败。

Keyword search 擅长 exact terms， vector search 擅长 semantic similarity。

结合二者可以同时提升 precision 和 recall。

4️⃣ BM25 Keyword Search

什么是 BM25？

BM25 是 keyword search 中常用的 ranking function。

它根据以下因素给 documents 打分：

Term frequency
Inverse document frequency
Document length normalization

Strengths

BM25 擅长：

Exact words
Error codes
Product names
API names
IDs
Acronyms
Legal phrases

Example

Query:
"POST /v1/payments 500 error"

BM25 finds documents containing those exact terms.

👉 面试回答

BM25 是经典 keyword ranking method。

它适合 exact terms 重要的场景，比如 API names、IDs、error codes、 product names 和 known phrases。

它 fast、predictable，也容易 debug。

5️⃣ Vector Search

什么是 Vector Search？

Vector search 根据 semantic similarity 检索内容。

它使用 embeddings。

Query
→ Embedding Model
→ Query Vector
→ Nearest Neighbor Search
→ Similar Chunks

Strengths

Vector search 擅长：

Natural language questions
Synonyms
Paraphrases
Conceptual similarity
Vague queries
Unknown exact wording

Example

User asks:
"Why can't customers complete checkout?"

Vector search may retrieve:
"Payment authorization failures during checkout."

👉 面试回答

Vector search 使用 embeddings 检索 semantically similar content。

当用户使用 natural language 提问，或 query 和 document 使用不同 wording 时， vector search 很有用。

6️⃣ High-Level Hybrid Architecture

Architecture

User Query
→ Query Preprocessor
→ BM25 Retriever
→ Vector Retriever
→ Candidate Merger
→ Score Normalizer
→ Re-ranker
→ Context Builder
→ LLM

Two Retrieval Paths

BM25 Path

Query
→ Inverted Index
→ Lexical Candidates

Vector Path

Query
→ Embedding Model
→ Vector Index
→ Semantic Candidates

Merge Path

Lexical Candidates
+ Semantic Candidates
→ Merge
→ Re-rank
→ Final Results

👉 面试回答

Hybrid search architecture 通常并行运行 lexical retrieval 和 vector retrieval。

系统之后 merge candidates、 normalize scores、应用 metadata filters、 re-rank results，并把最好的 chunks 发送给 LLM。

7️⃣ Candidate Generation

为什么 Candidate Generation 重要？

在 scale 下，不能对所有 documents 深度排序。

所以先生成 candidates。

Candidate Sources

BM25 returns top 100 lexical candidates.
Vector search returns top 100 semantic candidates.

然后系统合并它们。

Example

BM25 top 100
+
Vector top 100
→ Merge into candidate set
→ Deduplicate
→ Re-rank

👉 面试回答

Hybrid search 通常从 candidate generation 开始。

BM25 检索 lexical candidates， vector search 检索 semantic candidates，系统之后 merge 和 deduplicate，再做 re-ranking。

8️⃣ Score Normalization

为什么 Scores 很难 Combine？

BM25 scores 和 vector similarity scores 不一样。

BM25 score: 17.4
Vector cosine similarity: 0.82

它们不能直接比较。

Normalization Approaches

Min-max normalization
Z-score normalization
Rank-based fusion
Reciprocal Rank Fusion
Learned ranking model

Important Point

不要简单地把 raw BM25 score 和 vector score 相加。

👉 面试回答

BM25 scores 和 vector similarity scores 不在同一个 scale。

Production hybrid system 必须谨慎 normalize 或 fuse scores。

Rank-based fusion 或 learned ranking 通常比直接相加 raw scores 更可靠。

9️⃣ Reciprocal Rank Fusion

什么是 RRF？

Reciprocal Rank Fusion 基于 result positions 合并 ranked lists。

它不需要 raw scores 可比较。

Result score = sum of 1 / (k + rank)

为什么有效？

RRF 会奖励在任一 search method 中排名靠前的 documents。

Example

Document A:
Rank 2 in BM25
Rank 8 in Vector Search

Document B:
Rank 20 in BM25
Rank 1 in Vector Search

两者都可能是 strong candidates。

👉 面试回答

Reciprocal Rank Fusion 是合并 BM25 和 vector search results 的常见方式。

它使用 rank positions 而不是 raw scores，避免了直接比较 BM25 scores 和 vector similarity scores 的问题。

🔟 Metadata Filtering

为什么 Filtering 重要？

Hybrid search 仍然必须遵守 constraints。

Examples:

Permissions
Tenant
Language
Region
Product
Document type
Freshness
Source authority

Filter Flow

User Query
→ Permission Filter
→ BM25 + Vector Retrieval
→ Candidate Merge
→ Re-rank

Important Rule

Unauthorized chunks 不应该进入 LLM。

👉 面试回答

Metadata filtering 对 hybrid search systems 非常关键。

系统应该在 context 发送给 LLM 前， enforce permissions、freshness、 tenant、region、language 和 document-type filters。

1️⃣1️⃣ Re-ranking

为什么需要 Re-ranking？

Initial retrieval 主要优化 recall。

Re-ranking 用于提升 precision。

Re-ranking Flow

Merged Candidates
→ Cross-encoder / LLM Ranker / Learning-to-rank Model
→ Top Results

Benefits

Better relevance
Less noise
Better context quality
Lower hallucination risk

Trade-off

Re-ranking 提升 quality，但会增加 latency 和 cost。

👉 面试回答

Hybrid search 通常在 merge candidates 后使用 re-ranking。

BM25 和 vector search 生成 broad candidates， re-ranker 选择最相关的 chunks。

这会提升 precision，但也增加 latency 和 cost。

1️⃣2️⃣ Hybrid Search in RAG

RAG Flow

User Question
→ Hybrid Retrieval
→ Re-rank Chunks
→ Build Context
→ LLM Answer with Citations

Why It Improves RAG

Hybrid search 提升：

Exact match retrieval
Semantic recall
Source grounding
Citation quality
Answer accuracy

Example

Question:
"Why is GET /accounts failing with ERR_403?"

BM25 finds:
"GET /accounts"
"ERR_403"

Vector finds:
"authorization failure"
"permission denied"

两者结合提供更好 context。

👉 面试回答

Hybrid search 对 RAG 特别有用，因为 user queries 经常同时包含 natural language 和 exact terms。

BM25 捕捉 exact identifiers， vector search 捕捉 semantic context。

这提升 answer grounding 和 citation quality。

1️⃣3️⃣ Performance and Cost

Cost Drivers

Hybrid search 比 single-method search 更贵。

它可能需要：

BM25 retrieval
Query embedding generation
Vector retrieval
Candidate merging
Re-ranking

Optimization

Run BM25 and vector search in parallel
Limit candidate counts
Cache query embeddings
Cache popular retrieval results
Use lightweight re-rankers
Apply filters early

👉 面试回答

Hybrid search 提升 quality，但比单独 keyword 或 vector search 成本更高。

Production systems 通常通过 parallel retrieval、 limiting candidate counts、caching embeddings、 early filtering 和 lightweight re-rankers 来优化。

1️⃣4️⃣ Common Failure Modes

Failure Modes

Hybrid systems 可能失败因为：

Poor score normalization
Bad chunking
Weak embeddings
BM25 over-weighting exact terms
Vector search retrieving broad but irrelevant chunks
Duplicate candidates
Missing metadata filters
Re-ranker latency
Permission leaks

Example

BM25 finds exact error code,
but document is outdated.

Vector search finds newer semantic match,
but lower lexical score.

Bad fusion ranks old result first.

👉 面试回答

Hybrid search failure 经常发生在 merging 和 ranking 阶段。

如果 lexical scores、semantic scores、 freshness 和 permissions 没有处理好， final ranking 可能 misleading。

1️⃣5️⃣ Design Framework

Use BM25 For

Exact terms
Error codes
IDs
API names
Legal phrases
Product names

Use Vector Search For

Natural language questions
Synonyms
Paraphrases
Conceptual meaning
Vague queries

Use Re-ranking For

Final precision
Context quality
Reducing noise
Selecting best chunks for LLM

Use Metadata For

Permissions
Freshness
Filtering
Source authority
Compliance

👉 面试回答

我的设计框架是：用 BM25 做 exact matching，用 vector search 做 semantic recall，用 metadata filters 处理 constraints，用 re-ranking 做 final precision。

1️⃣6️⃣ Best Practices

Practical Rules

Run BM25 and vector search in parallel
Do not add raw scores directly
Use rank fusion or learned ranking
Apply metadata filters early
Deduplicate candidates
Re-rank merged results
Track freshness and source quality
Log retrieval paths
Evaluate BM25, vector, and hybrid separately
Tune candidate counts experimentally

Design Principle

BM25 gives exactness.
Vector search gives meaning.
Hybrid search gives robustness.

👉 面试回答

Strong hybrid search system 会结合多个 retrieval signals。

BM25 提供 exact matching。

Vector search 提供 semantic recall。

Metadata filters 提供 constraints。

Re-ranking 提供 final precision。

🧠 Staff-Level Answer Final

👉 面试回答完整版本

Hybrid search 结合 BM25 keyword retrieval 和 vector semantic retrieval。

这种架构有用，因为 keyword search 和 vector search 会以不同方式失败。

BM25 擅长 exact terms： IDs、error codes、API names、 product names、legal phrases 和 known keywords。

它 fast、predictable，也容易 debug。

但当 user 使用的 natural language 和 document wording 不一致时， BM25 容易 miss。

Vector search 使用 embeddings 检索 semantically similar content，能处理 paraphrases、synonyms、 vague queries 和 conceptual questions。

但 vector search 对 exact identifiers、 rare terms、acronyms 和 error codes 不一定可靠。

在 production hybrid system 中， query 通常走两条路径：一条是 inverted index 上的 BM25 retriever，一条是 embedding index 上的 vector retriever。

两条路径各自返回 candidate set。

系统之后 deduplicate candidates，应用 metadata filters， normalize 或 fuse rankings，并通常使用 re-ranker 选出 final top results。

Score merging 是最难的部分之一，因为 BM25 scores 和 vector similarity scores 不能直接比较。

常见方式是 rank-based fusion，比如 Reciprocal Rank Fusion，或 learned ranking model。

对 RAG systems 来说， hybrid search 通常是很好的默认方案，因为 user queries 经常同时包含 natural language intent 和 exact technical terms。

例如，一个 query 可能同时包含 API endpoint、 error code 和自然语言描述。

BM25 捕捉 endpoint 和 error code， vector search 捕捉 semantic meaning。

最终设计还应该包含 metadata filtering，用于 permissions、freshness、document type、 tenant、language 和 source authority。

核心原则是： BM25 gives exactness， vector search gives meaning， hybrid search gives robustness。

⭐ Final Insight

Hybrid Search 的核心不是简单地：

“BM25 + Vector 分数相加”

真正的 production hybrid search 是：

BM25 Candidate Generation

Vector Candidate Generation

Metadata Filtering

Score Fusion

Deduplication

Re-ranking

Access Control

Observability。

BM25 解决 exact match。

Vector Search 解决 semantic meaning。

Re-ranker 解决 final precision。

最重要的一句话：

BM25 gives exactness.

Vector search gives meaning.

Hybrid search gives robustness.