🎯 RAG Architecture Explained for Engineers
1️⃣ Core Framework
When discussing RAG Architecture, I frame it as:
- Why RAG is needed
- Knowledge ingestion
- Chunking and embeddings
- Vector storage and indexing
- Query rewriting and retrieval
- Ranking and context building
- Generation with citations
- Trade-offs: accuracy vs latency vs cost
2️⃣ What Is RAG?
RAG means Retrieval-Augmented Generation.
It combines:
External Knowledge Retrieval
+ LLM Generation
Instead of asking the LLM to answer only from model memory, the system retrieves relevant knowledge at runtime and gives it to the model as context.
Basic Flow
User Question
→ Retrieve relevant documents
→ Add documents to prompt
→ LLM generates grounded answer
👉 Interview Answer
RAG is an architecture where the system retrieves relevant external knowledge at runtime and provides that knowledge to the LLM as context.
This helps the model answer questions using private, updated, or domain-specific information instead of relying only on model memory.
3️⃣ Why Do We Need RAG?
LLM Limitations
LLMs may not know:
- Private company data
- Recent information
- Internal documentation
- Customer-specific data
- Domain-specific policies
- Large knowledge bases
Without RAG
User asks about internal policy
→ LLM guesses
→ Risk of hallucination
With RAG
User asks about internal policy
→ System retrieves policy document
→ LLM answers using retrieved context
👉 Interview Answer
RAG is useful because LLMs do not automatically know private or updated information.
By retrieving relevant documents at runtime, the system can ground the answer in real sources, reduce hallucination, and support enterprise knowledge use cases.
4️⃣ High-Level RAG Architecture
Architecture
Documents
→ Ingestion Pipeline
→ Chunking
→ Embedding Model
→ Vector Database
→ Retriever
→ Ranker
→ Prompt Builder
→ LLM
→ Answer with Sources
Two Main Paths
Offline Path
Documents
→ Clean
→ Chunk
→ Embed
→ Store
Online Path
User Query
→ Embed Query
→ Retrieve Chunks
→ Rank Results
→ Build Prompt
→ Generate Answer
👉 Interview Answer
RAG usually has two paths.
The offline ingestion path processes documents, chunks them, creates embeddings, and stores them in a search index or vector database.
The online query path retrieves relevant chunks, ranks them, builds the prompt, and sends the grounded context to the LLM.
5️⃣ Ingestion Pipeline
What Is Ingestion?
Ingestion is the process of preparing knowledge for retrieval.
Ingestion Steps
Raw Documents
→ Parse
→ Clean
→ Normalize
→ Chunk
→ Embed
→ Store
Input Sources
- PDFs
- Markdown files
- Web pages
- Internal docs
- Tickets
- Incident reports
- Code repositories
- Database records
Why Ingestion Matters
Bad ingestion leads to bad retrieval.
👉 Interview Answer
The ingestion pipeline prepares documents for RAG.
It parses raw data, cleans text, splits documents into chunks, generates embeddings, and stores them in a retrievable index.
Ingestion quality directly affects retrieval quality.
6️⃣ Chunking
What Is Chunking?
Chunking means splitting large documents into smaller pieces.
Long document
→ Chunk 1
→ Chunk 2
→ Chunk 3
Why Chunking Is Needed
LLMs and retrievers work better with focused pieces of text.
Chunking Strategies
| Strategy | Use Case |
|---|---|
| Fixed-size chunking | Simple documents |
| Semantic chunking | Structured knowledge |
| Section-based chunking | Markdown / docs |
| Sliding window | Preserve overlap |
| Code-aware chunking | Code repositories |
Chunk Size Trade-off
| Chunk Size | Strength | Weakness |
|---|---|---|
| Small chunks | Precise retrieval | May lose context |
| Large chunks | More context | Less precise and more costly |
👉 Interview Answer
Chunking is important because retrieval happens at the chunk level.
If chunks are too small, the system may lose context.
If chunks are too large, retrieval becomes less precise and prompts become expensive.
Good chunking balances precision and context.
7️⃣ Embeddings
What Is an Embedding?
An embedding converts text into a vector.
"refund policy"
→ [0.12, -0.45, 0.89, ...]
Texts with similar meaning have similar vectors.
Embedding Flow
Document chunk
→ Embedding model
→ Vector
→ Store in vector database
Query Embedding
User query
→ Embedding model
→ Query vector
→ Search similar document vectors
👉 Interview Answer
Embeddings allow semantic search.
The system converts document chunks and user queries into vectors, then searches for chunks with similar meaning.
This is the foundation of vector-based RAG retrieval.
8️⃣ Vector Database
What Does Vector DB Store?
A vector database stores:
- Chunk text
- Embedding vector
- Document ID
- Metadata
- Source URL
- Timestamp
- Access control tags
Example Record
{
"chunk_id": "chunk_123",
"document_id": "doc_456",
"text": "Refunds are allowed within 30 days...",
"embedding": [0.12, -0.45, 0.89],
"metadata": {
"source": "refund_policy.md",
"updated_at": "2026-05-24",
"department": "support"
}
}
Why Metadata Matters
Metadata supports:
- Filtering
- Access control
- Freshness ranking
- Citation generation
- Debugging
👉 Interview Answer
A vector database stores embeddings, document chunks, and metadata.
Metadata is important because it enables filtering, access control, freshness checks, citation generation, and debugging.
9️⃣ Retrieval
Retrieval Flow
User Query
→ Query Embedding
→ Vector Search
→ Candidate Chunks
→ Ranking
→ Selected Context
Retrieval Types
| Type | Description |
|---|---|
| Vector search | Semantic similarity |
| Keyword search | Exact term matching |
| Hybrid search | Vector + keyword |
| Metadata filtering | Filter by source, date, permission |
| Graph retrieval | Relationship-aware retrieval |
Why Hybrid Search Is Common
Vector search is good for meaning.
Keyword search is good for exact terms.
Hybrid search combines both.
👉 Interview Answer
Retrieval is the process of finding relevant chunks for the user query.
Many production RAG systems use hybrid retrieval, combining vector search for semantic similarity with keyword search for exact matches.
Metadata filtering is also important for permissions and freshness.
🔟 Ranking and Re-ranking
Why Ranking Is Needed
Initial retrieval may return noisy results.
The system must decide which chunks are most useful.
Ranking Signals
- Semantic relevance
- Keyword match
- Freshness
- Source authority
- User permission
- Document type
- Historical usefulness
Re-ranking Flow
Retrieve top 50 chunks
→ Re-ranker scores chunks
→ Select top 5 to 10 chunks
→ Add to prompt
👉 Interview Answer
Retrieval alone is often not enough.
Production RAG systems usually rank or re-rank candidate chunks before adding them to the prompt.
This improves relevance, reduces noise, and controls context size.
1️⃣1️⃣ Prompt Building
What Prompt Builder Does
The prompt builder combines:
- System instruction
- User question
- Retrieved context
- Citation metadata
- Output format
- Safety constraints
Prompt Structure
System instruction
User question
Retrieved context
Rules:
- Answer only from provided context
- Cite sources
- Say when context is insufficient
Output format
Important Rule
The model should know what context it can trust.
👉 Interview Answer
The prompt builder decides how retrieved knowledge is presented to the LLM.
It should include the user question, relevant chunks, citation metadata, output format, and instructions for handling insufficient context.
Good prompt construction improves factuality and consistency.
1️⃣2️⃣ Generation
Generation Step
The LLM receives retrieved context and generates the answer.
Retrieved Context
+ User Question
+ Instructions
→ LLM Answer
Good RAG Answer Should
- Use retrieved context
- Avoid unsupported claims
- Cite sources
- Mention uncertainty
- Say when context is insufficient
- Avoid hallucination
Failure Example
Retrieved context does not answer question
→ LLM guesses anyway
👉 Interview Answer
In the generation step, the LLM should answer using retrieved context.
If the retrieved context is insufficient, the system should instruct the model to say so instead of guessing.
This is critical for reducing hallucination.
1️⃣3️⃣ Access Control
Why Access Control Matters
Enterprise RAG may contain sensitive documents.
Users should only retrieve documents they are allowed to see.
Access Control Points
- During ingestion
- During retrieval filtering
- During prompt construction
- During citation display
- During logging
Example
User from Team A
→ Retrieve only documents allowed for Team A
👉 Interview Answer
Access control is critical in enterprise RAG.
The retrieval system must filter documents based on user permissions before adding context to the prompt.
The LLM should never receive unauthorized content.
1️⃣4️⃣ Evaluation
What to Evaluate
RAG systems need evaluation at multiple layers.
Retrieval Metrics
- Recall
- Precision
- Relevance
- Source freshness
- Permission correctness
Generation Metrics
- Factuality
- Faithfulness
- Citation quality
- Answer completeness
- Refusal when context is insufficient
Production Signals
- User feedback
- Click-through on citations
- Escalation rate
- Latency
- Cost
- Error rate
👉 Interview Answer
RAG evaluation should measure both retrieval quality and generation quality.
Good retrieval means the right documents are found.
Good generation means the answer is faithful to those documents, cites sources, and avoids unsupported claims.
1️⃣5️⃣ Common Failure Modes
Failure Modes
RAG systems can fail because:
- Bad ingestion
- Poor chunking
- Weak embeddings
- Wrong retrieval
- Missing metadata
- Stale documents
- Permission leaks
- Too much context
- LLM ignores context
- No evaluation loop
Example
Wrong chunk retrieved
→ LLM answers confidently
→ User receives incorrect answer
👉 Interview Answer
RAG failures can happen at any layer: ingestion, chunking, embedding, retrieval, ranking, prompt building, generation, or access control.
Debugging RAG requires tracing the full pipeline.
1️⃣6️⃣ Best Practices
Practical Rules
- Clean documents before indexing
- Use good chunking strategy
- Store metadata with chunks
- Use hybrid retrieval when needed
- Re-rank candidate chunks
- Enforce access control before prompt building
- Cite sources
- Evaluate retrieval and generation separately
- Log retrieved chunks and prompt versions
- Refresh stale documents
Design Principle
RAG quality depends more on retrieval and context engineering
than on the LLM alone.
👉 Interview Answer
A good RAG system is not just a vector database plus an LLM.
It requires high-quality ingestion, chunking, metadata, retrieval, ranking, prompt building, access control, evaluation, and observability.
🧠 Staff-Level Answer Final
👉 Interview Answer Full Version
RAG, or Retrieval-Augmented Generation, is an architecture where the system retrieves relevant external knowledge at runtime and provides it to the LLM as context.
This is useful because LLMs do not automatically know private, recent, or domain-specific information.
A production RAG system usually has two paths: an offline ingestion path and an online query path.
The offline path parses documents, cleans text, chunks documents, generates embeddings, and stores chunks with metadata in a vector database or search index.
The online path receives the user query, optionally rewrites it, embeds it, retrieves candidate chunks, ranks or re-ranks them, builds the prompt, and sends the grounded context to the LLM.
The quality of RAG depends heavily on ingestion and retrieval.
Bad chunking, weak metadata, stale documents, or poor retrieval can cause the LLM to produce incorrect answers even if the model is strong.
In production, I would usually use hybrid retrieval, combining vector search for semantic similarity with keyword search for exact matches.
I would also store metadata such as document ID, source, timestamp, owner, access control tags, and freshness signals.
Before sending context to the model, the system should filter by permissions, rank results, remove irrelevant chunks, and keep context within token limits.
The prompt should instruct the model to answer only from retrieved context, cite sources, and say when the context is insufficient.
RAG systems also need evaluation and observability.
I would measure retrieval recall, precision, relevance, permission correctness, answer faithfulness, citation quality, latency, and cost.
The key point is that RAG is not just “vector database plus LLM.”
It is a full knowledge system with ingestion, indexing, retrieval, ranking, prompt building, generation, access control, evaluation, and monitoring.
⭐ Final Insight
RAG 的核心不是“把文档塞给 LLM”。
真正的 RAG Architecture 是:
Ingestion
- Chunking
- Embeddings
- Vector / Hybrid Search
- Ranking
- Prompt Building
- Generation
- Citations
- Evaluation
- Access Control。
好的 RAG 系统, 质量主要取决于 retrieval 和 context engineering, 而不是只取决于 LLM 本身。
中文部分
🎯 RAG Architecture Explained for Engineers
1️⃣ 核心框架
讨论 RAG Architecture 时,我通常从这些方面分析:
- 为什么需要 RAG
- Knowledge ingestion
- Chunking and embeddings
- Vector storage and indexing
- Query rewriting and retrieval
- Ranking and context building
- Generation with citations
- 核心权衡:accuracy vs latency vs cost
2️⃣ 什么是 RAG?
RAG 表示 Retrieval-Augmented Generation。
它结合了:
External Knowledge Retrieval
+ LLM Generation
不是让 LLM 只依赖 model memory 回答, 而是在 runtime 检索相关知识, 再把这些知识作为 context 提供给 model。
Basic Flow
User Question
→ Retrieve relevant documents
→ Add documents to prompt
→ LLM generates grounded answer
👉 面试回答
RAG 是一种在 runtime 检索 external knowledge, 并把这些知识作为 context 提供给 LLM 的架构。
它让模型可以基于 private、updated 或 domain-specific information 回答问题, 而不是只依赖 model memory。
3️⃣ 为什么需要 RAG?
LLM Limitations
LLM 可能不知道:
- Private company data
- Recent information
- Internal documentation
- Customer-specific data
- Domain-specific policies
- Large knowledge bases
Without RAG
User asks about internal policy
→ LLM guesses
→ Risk of hallucination
With RAG
User asks about internal policy
→ System retrieves policy document
→ LLM answers using retrieved context
👉 面试回答
RAG 有价值, 因为 LLM 不会自动知道 private 或 updated information。
通过在 runtime 检索相关 documents, 系统可以让答案基于真实来源, 降低 hallucination, 并支持 enterprise knowledge use cases。
4️⃣ High-Level RAG Architecture
Architecture
Documents
→ Ingestion Pipeline
→ Chunking
→ Embedding Model
→ Vector Database
→ Retriever
→ Ranker
→ Prompt Builder
→ LLM
→ Answer with Sources
Two Main Paths
Offline Path
Documents
→ Clean
→ Chunk
→ Embed
→ Store
Online Path
User Query
→ Embed Query
→ Retrieve Chunks
→ Rank Results
→ Build Prompt
→ Generate Answer
👉 面试回答
RAG 通常有两条路径: offline ingestion path 和 online query path。
Offline path 负责处理 documents、 chunking、embedding, 并把结果存到 search index 或 vector database。
Online path 负责检索相关 chunks、 ranking、prompt building, 并把 grounded context 发送给 LLM。
5️⃣ Ingestion Pipeline
什么是 Ingestion?
Ingestion 是把 knowledge 准备成可检索形式的过程。
Ingestion Steps
Raw Documents
→ Parse
→ Clean
→ Normalize
→ Chunk
→ Embed
→ Store
Input Sources
- PDFs
- Markdown files
- Web pages
- Internal docs
- Tickets
- Incident reports
- Code repositories
- Database records
为什么 Ingestion 重要?
Bad ingestion 会导致 bad retrieval。
👉 面试回答
Ingestion pipeline 负责为 RAG 准备 documents。
它会 parse raw data、clean text、 split documents into chunks、 generate embeddings, 并把它们存入 retrievable index。
Ingestion quality 会直接影响 retrieval quality。
6️⃣ Chunking
什么是 Chunking?
Chunking 是把大文档切成更小片段。
Long document
→ Chunk 1
→ Chunk 2
→ Chunk 3
为什么需要 Chunking?
LLM 和 retriever 更适合处理聚焦的小文本块。
Chunking Strategies
| Strategy | Use Case |
|---|---|
| Fixed-size chunking | Simple documents |
| Semantic chunking | Structured knowledge |
| Section-based chunking | Markdown / docs |
| Sliding window | Preserve overlap |
| Code-aware chunking | Code repositories |
Chunk Size Trade-off
| Chunk Size | 优点 | 缺点 |
|---|---|---|
| Small chunks | Precise retrieval | May lose context |
| Large chunks | More context | Less precise and more costly |
👉 面试回答
Chunking 很重要, 因为 retrieval 是在 chunk level 发生的。
Chunk 太小会丢失 context。
Chunk 太大会降低 retrieval precision, 并增加 prompt cost。
好的 chunking 需要平衡 precision 和 context。
7️⃣ Embeddings
什么是 Embedding?
Embedding 把文本转换成向量。
"refund policy"
→ [0.12, -0.45, 0.89, ...]
语义相近的文本,向量也相近。
Embedding Flow
Document chunk
→ Embedding model
→ Vector
→ Store in vector database
Query Embedding
User query
→ Embedding model
→ Query vector
→ Search similar document vectors
👉 面试回答
Embeddings 让 semantic search 成为可能。
系统把 document chunks 和 user queries 转换成 vectors, 然后搜索语义相近的 chunks。
这是 vector-based RAG retrieval 的基础。
8️⃣ Vector Database
Vector DB 存什么?
Vector database 通常存储:
- Chunk text
- Embedding vector
- Document ID
- Metadata
- Source URL
- Timestamp
- Access control tags
Example Record
{
"chunk_id": "chunk_123",
"document_id": "doc_456",
"text": "Refunds are allowed within 30 days...",
"embedding": [0.12, -0.45, 0.89],
"metadata": {
"source": "refund_policy.md",
"updated_at": "2026-05-24",
"department": "support"
}
}
为什么 Metadata 重要?
Metadata 支持:
- Filtering
- Access control
- Freshness ranking
- Citation generation
- Debugging
👉 面试回答
Vector database 存储 embeddings、 document chunks 和 metadata。
Metadata 很重要, 因为它支持 filtering、access control、 freshness checks、citation generation 和 debugging。
9️⃣ Retrieval
Retrieval Flow
User Query
→ Query Embedding
→ Vector Search
→ Candidate Chunks
→ Ranking
→ Selected Context
Retrieval Types
| Type | Description |
|---|---|
| Vector search | Semantic similarity |
| Keyword search | Exact term matching |
| Hybrid search | Vector + keyword |
| Metadata filtering | Filter by source, date, permission |
| Graph retrieval | Relationship-aware retrieval |
为什么 Hybrid Search 常见?
Vector search 适合语义匹配。
Keyword search 适合精确词匹配。
Hybrid search 结合两者。
👉 面试回答
Retrieval 是为 user query 找到 relevant chunks 的过程。
很多 production RAG systems 使用 hybrid retrieval, 结合 vector search 的 semantic similarity 和 keyword search 的 exact matching。
Metadata filtering 对 permissions 和 freshness 也很重要。
🔟 Ranking and Re-ranking
为什么需要 Ranking?
Initial retrieval 可能返回 noisy results。
系统必须判断哪些 chunks 最有用。
Ranking Signals
- Semantic relevance
- Keyword match
- Freshness
- Source authority
- User permission
- Document type
- Historical usefulness
Re-ranking Flow
Retrieve top 50 chunks
→ Re-ranker scores chunks
→ Select top 5 to 10 chunks
→ Add to prompt
👉 面试回答
Retrieval alone 通常不够。
Production RAG systems 通常会对 candidate chunks 进行 ranking 或 re-ranking, 再放入 prompt。
这样可以提高 relevance, 降低 noise, 并控制 context size。
1️⃣1️⃣ Prompt Building
Prompt Builder 做什么?
Prompt builder 会组合:
- System instruction
- User question
- Retrieved context
- Citation metadata
- Output format
- Safety constraints
Prompt Structure
System instruction
User question
Retrieved context
Rules:
- Answer only from provided context
- Cite sources
- Say when context is insufficient
Output format
Important Rule
Model 应该知道哪些 context 是可信的。
👉 面试回答
Prompt builder 决定 retrieved knowledge 如何提供给 LLM。
它应该包含 user question、 relevant chunks、citation metadata、 output format, 以及如何处理 insufficient context 的 instructions。
好的 prompt construction 可以提升 factuality 和 consistency。
1️⃣2️⃣ Generation
Generation Step
LLM 接收 retrieved context 并生成回答。
Retrieved Context
+ User Question
+ Instructions
→ LLM Answer
Good RAG Answer Should
- Use retrieved context
- Avoid unsupported claims
- Cite sources
- Mention uncertainty
- Say when context is insufficient
- Avoid hallucination
Failure Example
Retrieved context does not answer question
→ LLM guesses anyway
👉 面试回答
在 generation step 中, LLM 应该基于 retrieved context 回答。
如果 retrieved context 不足, 系统应该要求 model 明确说明, 而不是猜测。
这对减少 hallucination 很关键。
1️⃣3️⃣ Access Control
为什么 Access Control 重要?
Enterprise RAG 可能包含 sensitive documents。
用户只能 retrieve 他们有权限看到的 documents。
Access Control Points
- During ingestion
- During retrieval filtering
- During prompt construction
- During citation display
- During logging
Example
User from Team A
→ Retrieve only documents allowed for Team A
👉 面试回答
Access control 对 enterprise RAG 很关键。
Retrieval system 必须根据 user permissions 过滤 documents, 然后才能把 context 放入 prompt。
LLM 不应该接收到 unauthorized content。
1️⃣4️⃣ Evaluation
需要评估什么?
RAG systems 需要在多个层面评估。
Retrieval Metrics
- Recall
- Precision
- Relevance
- Source freshness
- Permission correctness
Generation Metrics
- Factuality
- Faithfulness
- Citation quality
- Answer completeness
- Refusal when context is insufficient
Production Signals
- User feedback
- Click-through on citations
- Escalation rate
- Latency
- Cost
- Error rate
👉 面试回答
RAG evaluation 应该同时衡量 retrieval quality 和 generation quality。
好的 retrieval 意味着找到了正确 documents。
好的 generation 意味着答案 faithful to those documents, 有 citations, 并避免 unsupported claims。
1️⃣5️⃣ Common Failure Modes
Failure Modes
RAG systems 可能因为这些原因失败:
- Bad ingestion
- Poor chunking
- Weak embeddings
- Wrong retrieval
- Missing metadata
- Stale documents
- Permission leaks
- Too much context
- LLM ignores context
- No evaluation loop
Example
Wrong chunk retrieved
→ LLM answers confidently
→ User receives incorrect answer
👉 面试回答
RAG failures 可能发生在任何层: ingestion、chunking、embedding、 retrieval、ranking、prompt building、 generation 或 access control。
Debugging RAG 需要 trace 整个 pipeline。
1️⃣6️⃣ Best Practices
Practical Rules
- Clean documents before indexing
- Use good chunking strategy
- Store metadata with chunks
- Use hybrid retrieval when needed
- Re-rank candidate chunks
- Enforce access control before prompt building
- Cite sources
- Evaluate retrieval and generation separately
- Log retrieved chunks and prompt versions
- Refresh stale documents
Design Principle
RAG quality depends more on retrieval and context engineering
than on the LLM alone.
👉 面试回答
好的 RAG system 不只是 vector database 加 LLM。
它需要高质量 ingestion、chunking、 metadata、retrieval、ranking、 prompt building、access control、 evaluation 和 observability。
🧠 Staff-Level Answer Final
👉 面试回答完整版本
RAG,也就是 Retrieval-Augmented Generation, 是一种在 runtime 检索 external knowledge, 并把它提供给 LLM 作为 context 的架构。
它有价值, 因为 LLM 不会自动知道 private、recent 或 domain-specific information。
Production RAG system 通常有两条路径: offline ingestion path 和 online query path。
Offline path 负责 parse documents、 clean text、chunk documents、 generate embeddings, 并把 chunks 和 metadata 存储到 vector database 或 search index 中。
Online path 接收 user query, 可选地 rewrite query, embed query, retrieve candidate chunks, rank 或 re-rank, build prompt, 然后把 grounded context 发送给 LLM。
RAG 的质量高度依赖 ingestion 和 retrieval。
Bad chunking、weak metadata、 stale documents 或 poor retrieval, 即使 model 很强, 也可能导致错误答案。
在 production 中, 我通常会使用 hybrid retrieval, 把 vector search 的 semantic similarity 和 keyword search 的 exact matching 结合起来。
我也会存储 metadata, 比如 document ID、source、timestamp、 owner、access control tags 和 freshness signals。
在把 context 发送给 model 前, 系统应该按 permissions 过滤, rank results, 移除 irrelevant chunks, 并控制 context 在 token limits 内。
Prompt 应该要求 model 只基于 retrieved context 回答, cite sources, 并在 context insufficient 时明确说明。
RAG systems 还需要 evaluation 和 observability。
我会衡量 retrieval recall、precision、 relevance、permission correctness、 answer faithfulness、citation quality、 latency 和 cost。
核心点是: RAG 不是简单的 “vector database + LLM”。
它是一个完整的 knowledge system, 包括 ingestion、indexing、retrieval、 ranking、prompt building、generation、 access control、evaluation 和 monitoring。
⭐ Final Insight
RAG 的核心不是“把文档塞给 LLM”。
真正的 RAG Architecture 是:
Ingestion
- Chunking
- Embeddings
- Vector / Hybrid Search
- Ranking
- Prompt Building
- Generation
- Citations
- Evaluation
- Access Control。
好的 RAG 系统, 质量主要取决于 retrieval 和 context engineering, 而不是只取决于 LLM 本身。
Implement