🎯 Real-time RAG vs Batch RAG Systems
1️⃣ Core Framework
When comparing Real-time RAG vs Batch RAG, I frame it as:
- Data freshness requirements
- Ingestion latency
- Index update architecture
- Query-time retrieval path
- Cost and operational complexity
- Consistency and correctness
- Failure handling
- Trade-offs: freshness vs stability
2️⃣ What Is Batch RAG?
Batch RAG updates the knowledge index periodically.
Documents are processed in scheduled jobs.
Documents
→ Batch ingestion job
→ Chunking
→ Embedding
→ Index update
→ Retrieval
Example Schedule
Every hour
Every night
Every weekend
Best For
Batch RAG is good for:
- Stable documentation
- Internal wiki pages
- Product docs
- Policy documents
- Historical reports
- Knowledge bases with slower updates
👉 Interview Answer
Batch RAG updates the retrieval index on a schedule.
Documents are collected, parsed, chunked, embedded, and indexed periodically.
It is simpler, cheaper, and more stable, but it may serve stale information between updates.
3️⃣ What Is Real-time RAG?
Real-time RAG Definition
Real-time RAG updates knowledge as soon as data changes, or retrieves fresh data directly at query time.
Two Common Forms
Real-time Index Update
Document changes
→ Event emitted
→ Re-chunk
→ Re-embed
→ Update index immediately
Query-time Fresh Retrieval
User query
→ Fetch latest data from source system
→ Add fresh context to prompt
Best For
Real-time RAG is useful for:
- Incident data
- Metrics
- Logs
- Customer account state
- Inventory
- Pricing
- News
- Support tickets
- Frequently changing operational data
👉 Interview Answer
Real-time RAG is designed for freshness-sensitive use cases.
It either updates the retrieval index immediately when data changes, or fetches fresh data directly from source systems at query time.
This improves freshness, but increases complexity, latency, and cost.
4️⃣ Core Difference
Batch RAG
Knowledge is refreshed periodically.
Real-time RAG
Knowledge is refreshed continuously or retrieved live.
Comparison Table
| Dimension | Batch RAG | Real-time RAG |
|---|---|---|
| Freshness | Lower | Higher |
| Complexity | Lower | Higher |
| Cost | Lower | Higher |
| Latency | Lower and predictable | Higher and variable |
| Stability | Higher | Lower |
| Best for | Stable docs | Dynamic data |
| Failure risk | Lower | Higher |
| Operational burden | Lower | Higher |
👉 Interview Answer
The main difference is freshness.
Batch RAG refreshes knowledge periodically, while real-time RAG updates or retrieves knowledge continuously.
Batch RAG is simpler and more stable.
Real-time RAG is fresher but more complex and expensive.
5️⃣ Batch RAG Architecture
High-Level Architecture
Document Sources
→ Scheduled Ingestion Job
→ Parser
→ Cleaner
→ Chunker
→ Embedding Generator
→ Vector / Search Index
→ Retriever
→ LLM
Batch Flow
Nightly job starts
→ Read changed documents
→ Process documents
→ Generate embeddings
→ Update index
→ Mark index version active
Advantages
- Simpler architecture
- Easier debugging
- Lower cost
- Predictable load
- Easier evaluation
- Easier rollback
Disadvantages
- Stale between runs
- Not ideal for fast-changing data
- Large jobs may take time
- Update delay can affect correctness
👉 Interview Answer
Batch RAG is usually implemented with scheduled ingestion jobs.
The system periodically reads documents, chunks them, generates embeddings, updates the index, and makes a new index version available.
This is operationally simpler, but freshness depends on the batch frequency.
6️⃣ Real-time RAG Architecture
High-Level Architecture
Source System
→ Change Event / CDC
→ Stream Processor
→ Chunker
→ Embedding Service
→ Index Update
→ Retriever
→ LLM
Real-time Flow
Document updated
→ Change event emitted
→ Worker processes update
→ Embedding regenerated
→ Index updated
→ New content becomes searchable
Query-time Fresh Retrieval
User asks question
→ Retriever finds static context
→ Tool fetches latest source data
→ LLM combines both
Advantages
- Fresh information
- Better for operational systems
- Reduces stale-answer risk
- Supports dynamic workflows
Disadvantages
- More complex
- Higher cost
- Harder to debug
- More failure modes
- More variable latency
👉 Interview Answer
Real-time RAG usually relies on events, streams, CDC, or query-time tool calls.
It is useful when the answer depends on fresh operational data, but it requires more infrastructure, stronger failure handling, and more careful consistency controls.
7️⃣ Freshness Requirements
Key Question
How fresh does the answer need to be?
Examples
Freshness Not Critical
"What is our coding standard?"
Batch RAG is usually enough.
Freshness Critical
"Is this customer currently blocked?"
Real-time retrieval may be needed.
Freshness Classes
| Freshness Need | Example | Best Fit |
|---|---|---|
| Days | Wiki docs | Batch RAG |
| Hours | Product docs | Batch / near-real-time |
| Minutes | Tickets, incidents | Near-real-time RAG |
| Seconds | Account state, inventory | Real-time RAG / tools |
👉 Interview Answer
The first design question is freshness.
If the knowledge changes slowly, batch RAG is usually enough.
If the answer depends on current operational state, real-time RAG or direct tool calls are usually required.
8️⃣ Index Freshness vs Source Freshness
Important Difference
Index freshness and source freshness are not the same.
Source Freshness
The source system has the latest truth.
Example:
Database says account is blocked now.
Index Freshness
The RAG index may lag behind.
Example:
Vector index still has yesterday's account state.
Design Principle
For critical current state, query the source of truth.
👉 Interview Answer
Real-time systems must distinguish source freshness from index freshness.
The source system may have the latest truth, while the RAG index may lag.
For correctness-critical current state, I would query the source of truth directly instead of relying only on the vector index.
9️⃣ Query-time Retrieval vs Index-time Retrieval
Index-time Retrieval
Knowledge is embedded and indexed before queries.
Document
→ Embed
→ Index
→ Retrieve later
Query-time Retrieval
Fresh data is fetched when the user asks.
User query
→ Call live API / database
→ Add result to prompt
When Query-time Is Better
Use query-time retrieval when:
- Data changes rapidly
- Source of truth must be current
- Access control is dynamic
- Structured lookup is needed
- User-specific data is involved
👉 Interview Answer
Not all knowledge should be embedded into an index.
For fast-changing or correctness-critical data, query-time retrieval from the source of truth is often better than relying on stale indexed embeddings.
🔟 Consistency Challenges
Real-time RAG Consistency Problems
Real-time systems may experience:
- Out-of-order events
- Partial updates
- Duplicate events
- Failed embeddings
- Stale cache
- Index lag
- Deleted document still searchable
Example
Policy deleted
→ Delete event delayed
→ Old policy still appears in retrieval
Controls
- Version numbers
- Timestamps
- Idempotent updates
- Delete propagation
- Index freshness checks
- Event replay
- Dead-letter queues
👉 Interview Answer
Real-time RAG introduces consistency challenges.
Events may arrive out of order, updates may fail, and deleted documents may remain searchable.
Production systems need versioning, idempotency, timestamps, replay, and delete propagation.
1️⃣1️⃣ Cost and Latency
Batch RAG Cost
Batch RAG has predictable cost.
Process documents once per schedule
→ Reuse index for many queries
Real-time RAG Cost
Real-time RAG has higher cost.
It may require:
- Streaming pipeline
- Frequent embedding calls
- More index writes
- Live API calls
- More caching
- More monitoring
Latency Difference
| System | Query Latency |
|---|---|
| Batch RAG | Lower and stable |
| Real-time RAG | Higher and variable |
👉 Interview Answer
Batch RAG usually has lower and more predictable cost and latency.
Real-time RAG improves freshness, but it adds streaming infrastructure, more frequent embeddings, live source calls, and more variable latency.
1️⃣2️⃣ Failure Handling
Batch RAG Failure
Failure usually affects next index update.
Example:
Nightly job fails
→ Index remains at previous version
Real-time RAG Failure
Failure may affect live correctness.
Example:
Update event fails
→ Index misses latest change
→ User receives stale answer
Controls
- Retry policies
- Dead-letter queues
- Backfill jobs
- Index version rollback
- Health checks
- Freshness alerts
- Source-of-truth fallback
👉 Interview Answer
Failure handling is more difficult in real-time RAG.
Batch failures usually delay updates, while real-time failures can cause inconsistent or stale answers.
Production systems need retries, backfills, dead-letter queues, freshness monitoring, and fallback to source systems.
1️⃣3️⃣ Hybrid Design: Batch + Real-time
Most Practical Design
Many production systems combine both.
Batch RAG
→ Stable knowledge base
Real-time tools / streams
→ Fresh operational data
Example
Policy docs → Batch indexed
Current account status → Live API call
Recent tickets → Near-real-time index
Why Hybrid Works
- Batch handles stable knowledge cheaply
- Real-time handles fast-changing facts
- Tools handle source-of-truth lookups
- System balances freshness and cost
👉 Interview Answer
The best production design is often hybrid.
Use batch RAG for stable knowledge, near-real-time indexing for moderately fresh data, and direct source-of-truth tools for highly dynamic or correctness-critical data.
1️⃣4️⃣ Common Failure Modes
Batch RAG Failure Modes
- Stale index
- Slow backfill
- Bad batch job
- Incorrect chunking
- Missed document updates
- Index version mismatch
Real-time RAG Failure Modes
- Event loss
- Duplicate events
- Out-of-order updates
- Index lag
- Delete propagation failure
- Live API timeout
- Higher query latency
Example
Customer status changed 10 seconds ago.
RAG index updates every hour.
Agent gives stale answer.
👉 Interview Answer
Batch RAG mainly fails through staleness.
Real-time RAG mainly fails through consistency, event processing, latency, and operational complexity.
Choosing the wrong freshness model can lead to incorrect answers.
1️⃣5️⃣ Decision Framework
Choose Batch RAG When
- Documents change slowly
- Staleness is acceptable
- Cost must be controlled
- Queries need low latency
- System simplicity matters
- Knowledge base is document-heavy
Choose Real-time RAG When
- Freshness matters
- Data changes frequently
- Operational state is involved
- User-specific data is needed
- Decisions depend on current truth
- Stale answers are risky
Choose Hybrid When
- Some knowledge is stable
- Some data is dynamic
- Some answers require source-of-truth lookup
- Cost and freshness both matter
👉 Interview Answer
I choose batch RAG for stable document knowledge, real-time RAG for freshness-sensitive operational data, and hybrid RAG when the system needs both stable knowledge and current state.
1️⃣6️⃣ Best Practices
Practical Rules
- Start with freshness requirements
- Separate stable knowledge from dynamic data
- Use batch indexing for stable docs
- Use streaming or CDC for fresh document updates
- Use direct tools for source-of-truth state
- Track index version and freshness
- Add backfill jobs
- Handle deletes carefully
- Cache with freshness-aware keys
- Log retrieval freshness in traces
Design Principle
Do not use a stale index as the source of truth for live state.
👉 Interview Answer
The key best practice is to separate knowledge types.
Stable documents can be indexed in batch.
Fast-changing operational state should come from real-time systems or source-of-truth tools.
RAG indexes should not be treated as the source of truth for live state.
🧠 Staff-Level Answer Final
👉 Interview Answer Full Version
The difference between batch RAG and real-time RAG is mainly about freshness.
Batch RAG updates the knowledge index periodically through scheduled ingestion jobs.
It is simpler, cheaper, more stable, easier to debug, and usually has lower query latency.
It works well for relatively stable knowledge such as documentation, internal wiki pages, policies, historical reports, and knowledge bases.
The trade-off is that answers may be stale between index updates.
Real-time RAG is designed for freshness-sensitive use cases.
It can update the index immediately when source data changes through events, streams, or CDC, or it can fetch fresh data directly from source systems at query time.
This is useful for operational data such as incidents, tickets, logs, metrics, customer account state, inventory, and pricing.
But real-time RAG is more complex.
It introduces streaming infrastructure, more frequent embedding generation, more index writes, variable latency, consistency challenges, and more failure modes.
A key design distinction is source freshness versus index freshness.
The source system may have the latest truth, while the RAG index may lag behind.
For correctness-critical live state, I would query the source of truth directly instead of relying only on an embedding index.
In production, the best design is often hybrid: use batch RAG for stable knowledge, near-real-time indexing for moderately fresh data, and direct tools or APIs for highly dynamic, user-specific, or correctness-critical data.
The core principle is: do not use a stale index as the source of truth for live state.
⭐ Final Insight
Batch RAG 和 Real-time RAG 的核心区别是 freshness。
Batch RAG 更简单、更稳定、更便宜。
Real-time RAG 更新鲜,但更复杂、更贵、更难 debug。
Production 中最好的设计通常不是二选一, 而是 hybrid:
Stable knowledge → Batch RAG
Moderately fresh data → Near-real-time indexing
Live correctness-critical state → Source-of-truth tools
最重要的一句话:
Do not use a stale index as the source of truth for live state.
中文部分
🎯 Real-time RAG vs Batch RAG Systems
1️⃣ 核心框架
比较 Real-time RAG vs Batch RAG 时,我通常从这些方面分析:
- Data freshness requirements
- Ingestion latency
- Index update architecture
- Query-time retrieval path
- Cost and operational complexity
- Consistency and correctness
- Failure handling
- 核心权衡:freshness vs stability
2️⃣ 什么是 Batch RAG?
Batch RAG 会周期性更新 knowledge index。
Documents 通过 scheduled jobs 处理。
Documents
→ Batch ingestion job
→ Chunking
→ Embedding
→ Index update
→ Retrieval
Example Schedule
Every hour
Every night
Every weekend
Best For
Batch RAG 适合:
- Stable documentation
- Internal wiki pages
- Product docs
- Policy documents
- Historical reports
- Knowledge bases with slower updates
👉 面试回答
Batch RAG 会按照 schedule 更新 retrieval index。
Documents 会被周期性 collect、parse、 chunk、embed 和 index。
它更简单、更便宜、更稳定, 但在两次更新之间可能返回 stale information。
3️⃣ 什么是 Real-time RAG?
Real-time RAG Definition
Real-time RAG 会在 data changes 后尽快更新 knowledge, 或者在 query time 直接检索 fresh data。
Two Common Forms
Real-time Index Update
Document changes
→ Event emitted
→ Re-chunk
→ Re-embed
→ Update index immediately
Query-time Fresh Retrieval
User query
→ Fetch latest data from source system
→ Add fresh context to prompt
Best For
Real-time RAG 适合:
- Incident data
- Metrics
- Logs
- Customer account state
- Inventory
- Pricing
- News
- Support tickets
- Frequently changing operational data
👉 面试回答
Real-time RAG 是为 freshness-sensitive use cases 设计的。
它要么在 data changes 后立即更新 retrieval index, 要么在 query time 直接从 source systems 获取 fresh data。
这提升 freshness, 但会增加 complexity、latency 和 cost。
4️⃣ 核心区别
Batch RAG
Knowledge is refreshed periodically.
Real-time RAG
Knowledge is refreshed continuously or retrieved live.
Comparison Table
| Dimension | Batch RAG | Real-time RAG |
|---|---|---|
| Freshness | Lower | Higher |
| Complexity | Lower | Higher |
| Cost | Lower | Higher |
| Latency | Lower and predictable | Higher and variable |
| Stability | Higher | Lower |
| Best for | Stable docs | Dynamic data |
| Failure risk | Lower | Higher |
| Operational burden | Lower | Higher |
👉 面试回答
主要区别是 freshness。
Batch RAG 周期性刷新 knowledge, Real-time RAG 持续更新或实时检索 knowledge。
Batch RAG 更简单、更稳定。
Real-time RAG 更新鲜, 但更复杂、更昂贵。
5️⃣ Batch RAG Architecture
High-Level Architecture
Document Sources
→ Scheduled Ingestion Job
→ Parser
→ Cleaner
→ Chunker
→ Embedding Generator
→ Vector / Search Index
→ Retriever
→ LLM
Batch Flow
Nightly job starts
→ Read changed documents
→ Process documents
→ Generate embeddings
→ Update index
→ Mark index version active
Advantages
- Simpler architecture
- Easier debugging
- Lower cost
- Predictable load
- Easier evaluation
- Easier rollback
Disadvantages
- Stale between runs
- Not ideal for fast-changing data
- Large jobs may take time
- Update delay can affect correctness
👉 面试回答
Batch RAG 通常通过 scheduled ingestion jobs 实现。
系统周期性读取 documents、chunk、 generate embeddings、update index, 并让新的 index version 可用。
它 operationally simpler, 但 freshness 取决于 batch frequency。
6️⃣ Real-time RAG Architecture
High-Level Architecture
Source System
→ Change Event / CDC
→ Stream Processor
→ Chunker
→ Embedding Service
→ Index Update
→ Retriever
→ LLM
Real-time Flow
Document updated
→ Change event emitted
→ Worker processes update
→ Embedding regenerated
→ Index updated
→ New content becomes searchable
Query-time Fresh Retrieval
User asks question
→ Retriever finds static context
→ Tool fetches latest source data
→ LLM combines both
Advantages
- Fresh information
- Better for operational systems
- Reduces stale-answer risk
- Supports dynamic workflows
Disadvantages
- More complex
- Higher cost
- Harder to debug
- More failure modes
- More variable latency
👉 面试回答
Real-time RAG 通常依赖 events、streams、 CDC 或 query-time tool calls。
当答案依赖 fresh operational data 时, 它很有用, 但需要更多 infrastructure、 更强 failure handling 和更仔细的 consistency controls。
7️⃣ Freshness Requirements
Key Question
How fresh does the answer need to be?
Examples
Freshness Not Critical
"What is our coding standard?"
Batch RAG 通常足够。
Freshness Critical
"Is this customer currently blocked?"
可能需要 real-time retrieval。
Freshness Classes
| Freshness Need | Example | Best Fit |
|---|---|---|
| Days | Wiki docs | Batch RAG |
| Hours | Product docs | Batch / near-real-time |
| Minutes | Tickets, incidents | Near-real-time RAG |
| Seconds | Account state, inventory | Real-time RAG / tools |
👉 面试回答
第一个设计问题是 freshness。
如果 knowledge 变化慢, batch RAG 通常足够。
如果答案依赖 current operational state, 通常需要 real-time RAG 或 direct tool calls。
8️⃣ Index Freshness vs Source Freshness
重要区别
Index freshness 和 source freshness 不是一回事。
Source Freshness
Source system 有最新 truth。
Example:
Database says account is blocked now.
Index Freshness
RAG index 可能落后。
Example:
Vector index still has yesterday's account state.
Design Principle
对于 critical current state, query source of truth。
👉 面试回答
Real-time systems 必须区分 source freshness 和 index freshness。
Source system 可能有最新 truth, 但 RAG index 可能 lag。
对 correctness-critical current state, 我会直接 query source of truth, 而不是只依赖 vector index。
9️⃣ Query-time Retrieval vs Index-time Retrieval
Index-time Retrieval
Knowledge 在 queries 前被 embedding 和 indexing。
Document
→ Embed
→ Index
→ Retrieve later
Query-time Retrieval
Fresh data 在 user ask 时被获取。
User query
→ Call live API / database
→ Add result to prompt
When Query-time Is Better
当这些情况存在时使用 query-time retrieval:
- Data changes rapidly
- Source of truth must be current
- Access control is dynamic
- Structured lookup is needed
- User-specific data is involved
👉 面试回答
不是所有 knowledge 都应该被 embedding 到 index。
对 fast-changing 或 correctness-critical data, 从 source of truth 做 query-time retrieval 通常比依赖 stale indexed embeddings 更好。
🔟 Consistency Challenges
Real-time RAG Consistency Problems
Real-time systems 可能遇到:
- Out-of-order events
- Partial updates
- Duplicate events
- Failed embeddings
- Stale cache
- Index lag
- Deleted document still searchable
Example
Policy deleted
→ Delete event delayed
→ Old policy still appears in retrieval
Controls
- Version numbers
- Timestamps
- Idempotent updates
- Delete propagation
- Index freshness checks
- Event replay
- Dead-letter queues
👉 面试回答
Real-time RAG 会引入 consistency challenges。
Events 可能乱序, updates 可能失败, deleted documents 可能仍然 searchable。
Production systems 需要 versioning、 idempotency、timestamps、replay 和 delete propagation。
1️⃣1️⃣ Cost and Latency
Batch RAG Cost
Batch RAG 成本更 predictable。
Process documents once per schedule
→ Reuse index for many queries
Real-time RAG Cost
Real-time RAG 成本更高。
它可能需要:
- Streaming pipeline
- Frequent embedding calls
- More index writes
- Live API calls
- More caching
- More monitoring
Latency Difference
| System | Query Latency |
|---|---|
| Batch RAG | Lower and stable |
| Real-time RAG | Higher and variable |
👉 面试回答
Batch RAG 通常成本和 latency 更低、更可预测。
Real-time RAG 提升 freshness, 但会增加 streaming infrastructure、 更频繁的 embeddings、 live source calls 和更不稳定的 latency。
1️⃣2️⃣ Failure Handling
Batch RAG Failure
Failure 通常影响下一次 index update。
Example:
Nightly job fails
→ Index remains at previous version
Real-time RAG Failure
Failure 可能影响 live correctness。
Example:
Update event fails
→ Index misses latest change
→ User receives stale answer
Controls
- Retry policies
- Dead-letter queues
- Backfill jobs
- Index version rollback
- Health checks
- Freshness alerts
- Source-of-truth fallback
👉 面试回答
Real-time RAG 的 failure handling 更难。
Batch failures 通常只是 delay updates, 而 real-time failures 可能造成 inconsistent 或 stale answers。
Production systems 需要 retries、backfills、 dead-letter queues、freshness monitoring 和 source-system fallback。
1️⃣3️⃣ Hybrid Design: Batch + Real-time
Most Practical Design
很多 production systems 会组合两者。
Batch RAG
→ Stable knowledge base
Real-time tools / streams
→ Fresh operational data
Example
Policy docs → Batch indexed
Current account status → Live API call
Recent tickets → Near-real-time index
Why Hybrid Works
- Batch handles stable knowledge cheaply
- Real-time handles fast-changing facts
- Tools handle source-of-truth lookups
- System balances freshness and cost
👉 面试回答
最好的 production design 通常是 hybrid。
Stable knowledge 使用 batch RAG, moderately fresh data 使用 near-real-time indexing, highly dynamic 或 correctness-critical data 使用 direct source-of-truth tools。
1️⃣4️⃣ Common Failure Modes
Batch RAG Failure Modes
- Stale index
- Slow backfill
- Bad batch job
- Incorrect chunking
- Missed document updates
- Index version mismatch
Real-time RAG Failure Modes
- Event loss
- Duplicate events
- Out-of-order updates
- Index lag
- Delete propagation failure
- Live API timeout
- Higher query latency
Example
Customer status changed 10 seconds ago.
RAG index updates every hour.
Agent gives stale answer.
👉 面试回答
Batch RAG 主要通过 staleness 失败。
Real-time RAG 主要通过 consistency、 event processing、latency 和 operational complexity 失败。
选择错误的 freshness model 会导致 incorrect answers。
1️⃣5️⃣ Decision Framework
Choose Batch RAG When
- Documents change slowly
- Staleness is acceptable
- Cost must be controlled
- Queries need low latency
- System simplicity matters
- Knowledge base is document-heavy
Choose Real-time RAG When
- Freshness matters
- Data changes frequently
- Operational state is involved
- User-specific data is needed
- Decisions depend on current truth
- Stale answers are risky
Choose Hybrid When
- Some knowledge is stable
- Some data is dynamic
- Some answers require source-of-truth lookup
- Cost and freshness both matter
👉 面试回答
我会对 stable document knowledge 选择 batch RAG。
对 freshness-sensitive operational data 选择 real-time RAG。
当系统既需要 stable knowledge, 又需要 current state 时, 使用 hybrid RAG。
1️⃣6️⃣ Best Practices
Practical Rules
- Start with freshness requirements
- Separate stable knowledge from dynamic data
- Use batch indexing for stable docs
- Use streaming or CDC for fresh document updates
- Use direct tools for source-of-truth state
- Track index version and freshness
- Add backfill jobs
- Handle deletes carefully
- Cache with freshness-aware keys
- Log retrieval freshness in traces
Design Principle
Do not use a stale index as the source of truth for live state.
👉 面试回答
核心 best practice 是区分 knowledge types。
Stable documents 可以 batch indexing。
Fast-changing operational state 应该来自 real-time systems 或 source-of-truth tools。
RAG indexes 不应该被当成 live state 的 source of truth。
🧠 Staff-Level Answer Final
👉 面试回答完整版本
Batch RAG 和 real-time RAG 的区别, 主要是 freshness。
Batch RAG 通过 scheduled ingestion jobs 周期性更新 knowledge index。
它更简单、更便宜、更稳定、 更容易 debug, 通常 query latency 也更低。
它适合 relatively stable knowledge, 比如 documentation、internal wiki pages、 policies、historical reports 和 knowledge bases。
代价是两次 index updates 之间, answers 可能 stale。
Real-time RAG 是为 freshness-sensitive use cases 设计的。
它可以通过 events、streams 或 CDC 在 source data 变化后立即更新 index, 也可以在 query time 直接从 source systems 获取 fresh data。
这适合 operational data, 比如 incidents、tickets、logs、metrics、 customer account state、inventory 和 pricing。
但 real-time RAG 更复杂。
它引入 streaming infrastructure、 更频繁的 embedding generation、 更多 index writes、variable latency、 consistency challenges 和更多 failure modes。
一个关键设计区别是: source freshness vs index freshness。
Source system 可能有最新 truth, 但 RAG index 可能落后。
对 correctness-critical live state, 我会直接 query source of truth, 而不是只依赖 embedding index。
在 production 中, 最好的设计通常是 hybrid: stable knowledge 使用 batch RAG, moderately fresh data 使用 near-real-time indexing, highly dynamic、user-specific 或 correctness-critical data 使用 direct tools 或 APIs。
核心原则是: 不要把 stale index 当作 live state 的 source of truth。
⭐ Final Insight
Batch RAG 和 Real-time RAG 的核心区别是 freshness。
Batch RAG 更简单、更稳定、更便宜。
Real-time RAG 更新鲜, 但更复杂、更贵、更难 debug。
Production 中最好的设计通常不是二选一, 而是 hybrid:
Stable knowledge → Batch RAG
Moderately fresh data → Near-real-time indexing
Live correctness-critical state → Source-of-truth tools
最重要的一句话:
Do not use a stale index as the source of truth for live state.
Implement