aaa-rag RAG & Knowledge Systems ·

🎯 Real-time RAG vs Batch RAG Systems

1️⃣ Core Framework

When comparing Real-time RAG vs Batch RAG, I frame it as:

Data freshness requirements
Ingestion latency
Index update architecture
Query-time retrieval path
Cost and operational complexity
Consistency and correctness
Failure handling
Trade-offs: freshness vs stability

2️⃣ What Is Batch RAG?

Batch RAG updates the knowledge index periodically.

Documents are processed in scheduled jobs.

Documents
→ Batch ingestion job
→ Chunking
→ Embedding
→ Index update
→ Retrieval

Example Schedule

Every hour
Every night
Every weekend

Best For

Batch RAG is good for:

Stable documentation
Internal wiki pages
Product docs
Policy documents
Historical reports
Knowledge bases with slower updates

👉 Interview Answer

Batch RAG updates the retrieval index on a schedule.

Documents are collected, parsed, chunked, embedded, and indexed periodically.

It is simpler, cheaper, and more stable, but it may serve stale information between updates.

3️⃣ What Is Real-time RAG?

Real-time RAG Definition

Real-time RAG updates knowledge as soon as data changes, or retrieves fresh data directly at query time.

Two Common Forms

Real-time Index Update

Document changes
→ Event emitted
→ Re-chunk
→ Re-embed
→ Update index immediately

Query-time Fresh Retrieval

User query
→ Fetch latest data from source system
→ Add fresh context to prompt

Best For

Real-time RAG is useful for:

Incident data
Metrics
Logs
Customer account state
Inventory
Pricing
News
Support tickets
Frequently changing operational data

👉 Interview Answer

Real-time RAG is designed for freshness-sensitive use cases.

It either updates the retrieval index immediately when data changes, or fetches fresh data directly from source systems at query time.

This improves freshness, but increases complexity, latency, and cost.

4️⃣ Core Difference

Batch RAG

Knowledge is refreshed periodically.

Real-time RAG

Knowledge is refreshed continuously or retrieved live.

Comparison Table

Dimension	Batch RAG	Real-time RAG
Freshness	Lower	Higher
Complexity	Lower	Higher
Cost	Lower	Higher
Latency	Lower and predictable	Higher and variable
Stability	Higher	Lower
Best for	Stable docs	Dynamic data
Failure risk	Lower	Higher
Operational burden	Lower	Higher

👉 Interview Answer

The main difference is freshness.

Batch RAG refreshes knowledge periodically, while real-time RAG updates or retrieves knowledge continuously.

Batch RAG is simpler and more stable.

Real-time RAG is fresher but more complex and expensive.

5️⃣ Batch RAG Architecture

High-Level Architecture

Document Sources
→ Scheduled Ingestion Job
→ Parser
→ Cleaner
→ Chunker
→ Embedding Generator
→ Vector / Search Index
→ Retriever
→ LLM

Batch Flow

Nightly job starts
→ Read changed documents
→ Process documents
→ Generate embeddings
→ Update index
→ Mark index version active

Advantages

Simpler architecture
Easier debugging
Lower cost
Predictable load
Easier evaluation
Easier rollback

Disadvantages

Stale between runs
Not ideal for fast-changing data
Large jobs may take time
Update delay can affect correctness

👉 Interview Answer

Batch RAG is usually implemented with scheduled ingestion jobs.

The system periodically reads documents, chunks them, generates embeddings, updates the index, and makes a new index version available.

This is operationally simpler, but freshness depends on the batch frequency.

6️⃣ Real-time RAG Architecture

High-Level Architecture

Source System
→ Change Event / CDC
→ Stream Processor
→ Chunker
→ Embedding Service
→ Index Update
→ Retriever
→ LLM

Real-time Flow

Document updated
→ Change event emitted
→ Worker processes update
→ Embedding regenerated
→ Index updated
→ New content becomes searchable

Query-time Fresh Retrieval

User asks question
→ Retriever finds static context
→ Tool fetches latest source data
→ LLM combines both

Advantages

Fresh information
Better for operational systems
Reduces stale-answer risk
Supports dynamic workflows

Disadvantages

More complex
Higher cost
Harder to debug
More failure modes
More variable latency

👉 Interview Answer

Real-time RAG usually relies on events, streams, CDC, or query-time tool calls.

It is useful when the answer depends on fresh operational data, but it requires more infrastructure, stronger failure handling, and more careful consistency controls.

7️⃣ Freshness Requirements

Key Question

How fresh does the answer need to be?

Examples

Freshness Not Critical

"What is our coding standard?"

Batch RAG is usually enough.

Freshness Critical

"Is this customer currently blocked?"

Real-time retrieval may be needed.

Freshness Classes

Freshness Need	Example	Best Fit
Days	Wiki docs	Batch RAG
Hours	Product docs	Batch / near-real-time
Minutes	Tickets, incidents	Near-real-time RAG
Seconds	Account state, inventory	Real-time RAG / tools

👉 Interview Answer

The first design question is freshness.

If the knowledge changes slowly, batch RAG is usually enough.

If the answer depends on current operational state, real-time RAG or direct tool calls are usually required.

8️⃣ Index Freshness vs Source Freshness

Important Difference

Index freshness and source freshness are not the same.

Source Freshness

The source system has the latest truth.

Example:

Database says account is blocked now.

Index Freshness

The RAG index may lag behind.

Example:

Vector index still has yesterday's account state.

Design Principle

For critical current state, query the source of truth.

👉 Interview Answer

Real-time systems must distinguish source freshness from index freshness.

The source system may have the latest truth, while the RAG index may lag.

For correctness-critical current state, I would query the source of truth directly instead of relying only on the vector index.

9️⃣ Query-time Retrieval vs Index-time Retrieval

Index-time Retrieval

Knowledge is embedded and indexed before queries.

Document
→ Embed
→ Index
→ Retrieve later

Query-time Retrieval

Fresh data is fetched when the user asks.

User query
→ Call live API / database
→ Add result to prompt

When Query-time Is Better

Use query-time retrieval when:

Data changes rapidly
Source of truth must be current
Access control is dynamic
Structured lookup is needed
User-specific data is involved

👉 Interview Answer

Not all knowledge should be embedded into an index.

For fast-changing or correctness-critical data, query-time retrieval from the source of truth is often better than relying on stale indexed embeddings.

🔟 Consistency Challenges

Real-time RAG Consistency Problems

Real-time systems may experience:

Out-of-order events
Partial updates
Duplicate events
Failed embeddings
Stale cache
Index lag
Deleted document still searchable

Example

Policy deleted
→ Delete event delayed
→ Old policy still appears in retrieval

Controls

Version numbers
Timestamps
Idempotent updates
Delete propagation
Index freshness checks
Event replay
Dead-letter queues

👉 Interview Answer

Real-time RAG introduces consistency challenges.

Events may arrive out of order, updates may fail, and deleted documents may remain searchable.

Production systems need versioning, idempotency, timestamps, replay, and delete propagation.

1️⃣1️⃣ Cost and Latency

Batch RAG Cost

Batch RAG has predictable cost.

Process documents once per schedule
→ Reuse index for many queries

Real-time RAG Cost

Real-time RAG has higher cost.

It may require:

Streaming pipeline
Frequent embedding calls
More index writes
Live API calls
More caching
More monitoring

Latency Difference

System	Query Latency
Batch RAG	Lower and stable
Real-time RAG	Higher and variable

👉 Interview Answer

Batch RAG usually has lower and more predictable cost and latency.

Real-time RAG improves freshness, but it adds streaming infrastructure, more frequent embeddings, live source calls, and more variable latency.

1️⃣2️⃣ Failure Handling

Batch RAG Failure

Failure usually affects next index update.

Example:

Nightly job fails
→ Index remains at previous version

Real-time RAG Failure

Failure may affect live correctness.

Example:

Update event fails
→ Index misses latest change
→ User receives stale answer

Controls

Retry policies
Dead-letter queues
Backfill jobs
Index version rollback
Health checks
Freshness alerts
Source-of-truth fallback

👉 Interview Answer

Failure handling is more difficult in real-time RAG.

Batch failures usually delay updates, while real-time failures can cause inconsistent or stale answers.

Production systems need retries, backfills, dead-letter queues, freshness monitoring, and fallback to source systems.

1️⃣3️⃣ Hybrid Design: Batch + Real-time

Most Practical Design

Many production systems combine both.

Batch RAG
→ Stable knowledge base

Real-time tools / streams
→ Fresh operational data

Example

Policy docs → Batch indexed

Current account status → Live API call

Recent tickets → Near-real-time index

Why Hybrid Works

Batch handles stable knowledge cheaply
Real-time handles fast-changing facts
Tools handle source-of-truth lookups
System balances freshness and cost

👉 Interview Answer

The best production design is often hybrid.

Use batch RAG for stable knowledge, near-real-time indexing for moderately fresh data, and direct source-of-truth tools for highly dynamic or correctness-critical data.

1️⃣4️⃣ Common Failure Modes

Batch RAG Failure Modes

Stale index
Slow backfill
Bad batch job
Incorrect chunking
Missed document updates
Index version mismatch

Real-time RAG Failure Modes

Event loss
Duplicate events
Out-of-order updates
Index lag
Delete propagation failure
Live API timeout
Higher query latency

Example

Customer status changed 10 seconds ago.
RAG index updates every hour.
Agent gives stale answer.

👉 Interview Answer

Batch RAG mainly fails through staleness.

Real-time RAG mainly fails through consistency, event processing, latency, and operational complexity.

Choosing the wrong freshness model can lead to incorrect answers.

1️⃣5️⃣ Decision Framework

Choose Batch RAG When

Documents change slowly
Staleness is acceptable
Cost must be controlled
Queries need low latency
System simplicity matters
Knowledge base is document-heavy

Choose Real-time RAG When

Freshness matters
Data changes frequently
Operational state is involved
User-specific data is needed
Decisions depend on current truth
Stale answers are risky

Choose Hybrid When

Some knowledge is stable
Some data is dynamic
Some answers require source-of-truth lookup
Cost and freshness both matter

👉 Interview Answer

I choose batch RAG for stable document knowledge, real-time RAG for freshness-sensitive operational data, and hybrid RAG when the system needs both stable knowledge and current state.

1️⃣6️⃣ Best Practices

Practical Rules

Start with freshness requirements
Separate stable knowledge from dynamic data
Use batch indexing for stable docs
Use streaming or CDC for fresh document updates
Use direct tools for source-of-truth state
Track index version and freshness
Add backfill jobs
Handle deletes carefully
Cache with freshness-aware keys
Log retrieval freshness in traces

Design Principle

Do not use a stale index as the source of truth for live state.

👉 Interview Answer

The key best practice is to separate knowledge types.

Stable documents can be indexed in batch.

Fast-changing operational state should come from real-time systems or source-of-truth tools.

RAG indexes should not be treated as the source of truth for live state.

🧠 Staff-Level Answer Final

👉 Interview Answer Full Version

The difference between batch RAG and real-time RAG is mainly about freshness.

Batch RAG updates the knowledge index periodically through scheduled ingestion jobs.

It is simpler, cheaper, more stable, easier to debug, and usually has lower query latency.

It works well for relatively stable knowledge such as documentation, internal wiki pages, policies, historical reports, and knowledge bases.

The trade-off is that answers may be stale between index updates.

Real-time RAG is designed for freshness-sensitive use cases.

It can update the index immediately when source data changes through events, streams, or CDC, or it can fetch fresh data directly from source systems at query time.

This is useful for operational data such as incidents, tickets, logs, metrics, customer account state, inventory, and pricing.

But real-time RAG is more complex.

It introduces streaming infrastructure, more frequent embedding generation, more index writes, variable latency, consistency challenges, and more failure modes.

A key design distinction is source freshness versus index freshness.

The source system may have the latest truth, while the RAG index may lag behind.

For correctness-critical live state, I would query the source of truth directly instead of relying only on an embedding index.

In production, the best design is often hybrid: use batch RAG for stable knowledge, near-real-time indexing for moderately fresh data, and direct tools or APIs for highly dynamic, user-specific, or correctness-critical data.

The core principle is: do not use a stale index as the source of truth for live state.

⭐ Final Insight

Batch RAG 和 Real-time RAG 的核心区别是 freshness。

Batch RAG 更简单、更稳定、更便宜。

Real-time RAG 更新鲜，但更复杂、更贵、更难 debug。

Production 中最好的设计通常不是二选一，而是 hybrid：

Stable knowledge → Batch RAG

Moderately fresh data → Near-real-time indexing

Live correctness-critical state → Source-of-truth tools

最重要的一句话：

Do not use a stale index as the source of truth for live state.

中文部分

🎯 Real-time RAG vs Batch RAG Systems

1️⃣ 核心框架

比较 Real-time RAG vs Batch RAG 时，我通常从这些方面分析：

Data freshness requirements
Ingestion latency
Index update architecture
Query-time retrieval path
Cost and operational complexity
Consistency and correctness
Failure handling
核心权衡：freshness vs stability

2️⃣ 什么是 Batch RAG？

Batch RAG 会周期性更新 knowledge index。

Documents 通过 scheduled jobs 处理。

Documents
→ Batch ingestion job
→ Chunking
→ Embedding
→ Index update
→ Retrieval

Example Schedule

Every hour
Every night
Every weekend

Best For

Batch RAG 适合：

Stable documentation
Internal wiki pages
Product docs
Policy documents
Historical reports
Knowledge bases with slower updates

👉 面试回答

Batch RAG 会按照 schedule 更新 retrieval index。

Documents 会被周期性 collect、parse、 chunk、embed 和 index。

它更简单、更便宜、更稳定，但在两次更新之间可能返回 stale information。

3️⃣ 什么是 Real-time RAG？

Real-time RAG Definition

Real-time RAG 会在 data changes 后尽快更新 knowledge，或者在 query time 直接检索 fresh data。

Two Common Forms

Real-time Index Update

Document changes
→ Event emitted
→ Re-chunk
→ Re-embed
→ Update index immediately

Query-time Fresh Retrieval

User query
→ Fetch latest data from source system
→ Add fresh context to prompt

Best For

Real-time RAG 适合：

Incident data
Metrics
Logs
Customer account state
Inventory
Pricing
News
Support tickets
Frequently changing operational data

👉 面试回答

Real-time RAG 是为 freshness-sensitive use cases 设计的。

它要么在 data changes 后立即更新 retrieval index，要么在 query time 直接从 source systems 获取 fresh data。

这提升 freshness，但会增加 complexity、latency 和 cost。

4️⃣ 核心区别

Batch RAG

Knowledge is refreshed periodically.

Real-time RAG

Knowledge is refreshed continuously or retrieved live.

Comparison Table

Dimension	Batch RAG	Real-time RAG
Freshness	Lower	Higher
Complexity	Lower	Higher
Cost	Lower	Higher
Latency	Lower and predictable	Higher and variable
Stability	Higher	Lower
Best for	Stable docs	Dynamic data
Failure risk	Lower	Higher
Operational burden	Lower	Higher

👉 面试回答

主要区别是 freshness。

Batch RAG 周期性刷新 knowledge， Real-time RAG 持续更新或实时检索 knowledge。

Batch RAG 更简单、更稳定。

Real-time RAG 更新鲜，但更复杂、更昂贵。

5️⃣ Batch RAG Architecture

High-Level Architecture

Document Sources
→ Scheduled Ingestion Job
→ Parser
→ Cleaner
→ Chunker
→ Embedding Generator
→ Vector / Search Index
→ Retriever
→ LLM

Batch Flow

Nightly job starts
→ Read changed documents
→ Process documents
→ Generate embeddings
→ Update index
→ Mark index version active

Advantages

Simpler architecture
Easier debugging
Lower cost
Predictable load
Easier evaluation
Easier rollback

Disadvantages

Stale between runs
Not ideal for fast-changing data
Large jobs may take time
Update delay can affect correctness

👉 面试回答

Batch RAG 通常通过 scheduled ingestion jobs 实现。

系统周期性读取 documents、chunk、 generate embeddings、update index，并让新的 index version 可用。

它 operationally simpler，但 freshness 取决于 batch frequency。

6️⃣ Real-time RAG Architecture

High-Level Architecture

Source System
→ Change Event / CDC
→ Stream Processor
→ Chunker
→ Embedding Service
→ Index Update
→ Retriever
→ LLM

Real-time Flow

Document updated
→ Change event emitted
→ Worker processes update
→ Embedding regenerated
→ Index updated
→ New content becomes searchable

Query-time Fresh Retrieval

User asks question
→ Retriever finds static context
→ Tool fetches latest source data
→ LLM combines both

Advantages

Fresh information
Better for operational systems
Reduces stale-answer risk
Supports dynamic workflows

Disadvantages

More complex
Higher cost
Harder to debug
More failure modes
More variable latency

👉 面试回答

Real-time RAG 通常依赖 events、streams、 CDC 或 query-time tool calls。

当答案依赖 fresh operational data 时，它很有用，但需要更多 infrastructure、更强 failure handling 和更仔细的 consistency controls。

7️⃣ Freshness Requirements

Key Question

How fresh does the answer need to be?

Examples

Freshness Not Critical

"What is our coding standard?"

Batch RAG 通常足够。

Freshness Critical

"Is this customer currently blocked?"

可能需要 real-time retrieval。

Freshness Classes

Freshness Need	Example	Best Fit
Days	Wiki docs	Batch RAG
Hours	Product docs	Batch / near-real-time
Minutes	Tickets, incidents	Near-real-time RAG
Seconds	Account state, inventory	Real-time RAG / tools

👉 面试回答

第一个设计问题是 freshness。

如果 knowledge 变化慢， batch RAG 通常足够。

如果答案依赖 current operational state，通常需要 real-time RAG 或 direct tool calls。

8️⃣ Index Freshness vs Source Freshness

重要区别

Index freshness 和 source freshness 不是一回事。

Source Freshness

Source system 有最新 truth。

Example:

Database says account is blocked now.

Index Freshness

RAG index 可能落后。

Example:

Vector index still has yesterday's account state.

Design Principle

对于 critical current state， query source of truth。

👉 面试回答

Real-time systems 必须区分 source freshness 和 index freshness。

Source system 可能有最新 truth，但 RAG index 可能 lag。

对 correctness-critical current state，我会直接 query source of truth，而不是只依赖 vector index。

9️⃣ Query-time Retrieval vs Index-time Retrieval

Index-time Retrieval

Knowledge 在 queries 前被 embedding 和 indexing。

Document
→ Embed
→ Index
→ Retrieve later

Query-time Retrieval

Fresh data 在 user ask 时被获取。

User query
→ Call live API / database
→ Add result to prompt

When Query-time Is Better

当这些情况存在时使用 query-time retrieval：

Data changes rapidly
Source of truth must be current
Access control is dynamic
Structured lookup is needed
User-specific data is involved

👉 面试回答

不是所有 knowledge 都应该被 embedding 到 index。

对 fast-changing 或 correctness-critical data，从 source of truth 做 query-time retrieval 通常比依赖 stale indexed embeddings 更好。

🔟 Consistency Challenges

Real-time RAG Consistency Problems

Real-time systems 可能遇到：

Out-of-order events
Partial updates
Duplicate events
Failed embeddings
Stale cache
Index lag
Deleted document still searchable

Example

Policy deleted
→ Delete event delayed
→ Old policy still appears in retrieval

Controls

Version numbers
Timestamps
Idempotent updates
Delete propagation
Index freshness checks
Event replay
Dead-letter queues

👉 面试回答

Real-time RAG 会引入 consistency challenges。

Events 可能乱序， updates 可能失败， deleted documents 可能仍然 searchable。

Production systems 需要 versioning、 idempotency、timestamps、replay 和 delete propagation。

1️⃣1️⃣ Cost and Latency

Batch RAG Cost

Batch RAG 成本更 predictable。

Process documents once per schedule
→ Reuse index for many queries

Real-time RAG Cost

Real-time RAG 成本更高。

它可能需要：

Streaming pipeline
Frequent embedding calls
More index writes
Live API calls
More caching
More monitoring

Latency Difference

System	Query Latency
Batch RAG	Lower and stable
Real-time RAG	Higher and variable

👉 面试回答

Batch RAG 通常成本和 latency 更低、更可预测。

Real-time RAG 提升 freshness，但会增加 streaming infrastructure、更频繁的 embeddings、 live source calls 和更不稳定的 latency。

1️⃣2️⃣ Failure Handling

Batch RAG Failure

Failure 通常影响下一次 index update。

Example:

Nightly job fails
→ Index remains at previous version

Real-time RAG Failure

Failure 可能影响 live correctness。

Example:

Update event fails
→ Index misses latest change
→ User receives stale answer

Controls

Retry policies
Dead-letter queues
Backfill jobs
Index version rollback
Health checks
Freshness alerts
Source-of-truth fallback

👉 面试回答

Real-time RAG 的 failure handling 更难。

Batch failures 通常只是 delay updates，而 real-time failures 可能造成 inconsistent 或 stale answers。

Production systems 需要 retries、backfills、 dead-letter queues、freshness monitoring 和 source-system fallback。

1️⃣3️⃣ Hybrid Design: Batch + Real-time

Most Practical Design

很多 production systems 会组合两者。

Batch RAG
→ Stable knowledge base

Real-time tools / streams
→ Fresh operational data

Example

Policy docs → Batch indexed

Current account status → Live API call

Recent tickets → Near-real-time index

Why Hybrid Works

Batch handles stable knowledge cheaply
Real-time handles fast-changing facts
Tools handle source-of-truth lookups
System balances freshness and cost

👉 面试回答

最好的 production design 通常是 hybrid。

Stable knowledge 使用 batch RAG， moderately fresh data 使用 near-real-time indexing， highly dynamic 或 correctness-critical data 使用 direct source-of-truth tools。

1️⃣4️⃣ Common Failure Modes

Batch RAG Failure Modes

Stale index
Slow backfill
Bad batch job
Incorrect chunking
Missed document updates
Index version mismatch

Real-time RAG Failure Modes

Event loss
Duplicate events
Out-of-order updates
Index lag
Delete propagation failure
Live API timeout
Higher query latency

Example

Customer status changed 10 seconds ago.
RAG index updates every hour.
Agent gives stale answer.

👉 面试回答

Batch RAG 主要通过 staleness 失败。

Real-time RAG 主要通过 consistency、 event processing、latency 和 operational complexity 失败。

选择错误的 freshness model 会导致 incorrect answers。

1️⃣5️⃣ Decision Framework

Choose Batch RAG When

Documents change slowly
Staleness is acceptable
Cost must be controlled
Queries need low latency
System simplicity matters
Knowledge base is document-heavy

Choose Real-time RAG When

Freshness matters
Data changes frequently
Operational state is involved
User-specific data is needed
Decisions depend on current truth
Stale answers are risky

Choose Hybrid When

Some knowledge is stable
Some data is dynamic
Some answers require source-of-truth lookup
Cost and freshness both matter

👉 面试回答

我会对 stable document knowledge 选择 batch RAG。

对 freshness-sensitive operational data 选择 real-time RAG。

当系统既需要 stable knowledge，又需要 current state 时，使用 hybrid RAG。

1️⃣6️⃣ Best Practices

Practical Rules

Start with freshness requirements
Separate stable knowledge from dynamic data
Use batch indexing for stable docs
Use streaming or CDC for fresh document updates
Use direct tools for source-of-truth state
Track index version and freshness
Add backfill jobs
Handle deletes carefully
Cache with freshness-aware keys
Log retrieval freshness in traces

Design Principle

Do not use a stale index as the source of truth for live state.

👉 面试回答

核心 best practice 是区分 knowledge types。

Stable documents 可以 batch indexing。

Fast-changing operational state 应该来自 real-time systems 或 source-of-truth tools。

RAG indexes 不应该被当成 live state 的 source of truth。

🧠 Staff-Level Answer Final

👉 面试回答完整版本

Batch RAG 和 real-time RAG 的区别，主要是 freshness。

Batch RAG 通过 scheduled ingestion jobs 周期性更新 knowledge index。

它更简单、更便宜、更稳定、更容易 debug，通常 query latency 也更低。

它适合 relatively stable knowledge，比如 documentation、internal wiki pages、 policies、historical reports 和 knowledge bases。

代价是两次 index updates 之间， answers 可能 stale。

Real-time RAG 是为 freshness-sensitive use cases 设计的。

它可以通过 events、streams 或 CDC 在 source data 变化后立即更新 index，也可以在 query time 直接从 source systems 获取 fresh data。

这适合 operational data，比如 incidents、tickets、logs、metrics、 customer account state、inventory 和 pricing。

但 real-time RAG 更复杂。

它引入 streaming infrastructure、更频繁的 embedding generation、更多 index writes、variable latency、 consistency challenges 和更多 failure modes。

一个关键设计区别是： source freshness vs index freshness。

Source system 可能有最新 truth，但 RAG index 可能落后。

对 correctness-critical live state，我会直接 query source of truth，而不是只依赖 embedding index。

在 production 中，最好的设计通常是 hybrid： stable knowledge 使用 batch RAG， moderately fresh data 使用 near-real-time indexing， highly dynamic、user-specific 或 correctness-critical data 使用 direct tools 或 APIs。

核心原则是：不要把 stale index 当作 live state 的 source of truth。

⭐ Final Insight

Batch RAG 和 Real-time RAG 的核心区别是 freshness。

Batch RAG 更简单、更稳定、更便宜。

Real-time RAG 更新鲜，但更复杂、更贵、更难 debug。

Production 中最好的设计通常不是二选一，而是 hybrid：

Stable knowledge → Batch RAG

Moderately fresh data → Near-real-time indexing

Live correctness-critical state → Source-of-truth tools

最重要的一句话：

Do not use a stale index as the source of truth for live state.