·

From Traditional SD to LLM System Design - 07 Hallucination Control System

Post by ailswan June. 07, 2026

中文 ↓

🎯 Hallucination Control System


1️⃣ Core Upgrade Framework

This topic is about upgrading a traditional system design answer into an LLM system design answer for Hallucination Control System.

Traditional SD baseline:

A traditional correctness system relies on validation, constraints, tests, source-of-truth databases, and deterministic business rules.

LLM system upgrade:

Hallucination control adds retrieval grounding, citations, uncertainty handling, answer verification, constrained decoding, refusal policy, and post-generation validators.

I usually structure the answer around eight dimensions:

  1. traditional baseline
  2. LLM-specific components
  3. request lifecycle
  4. context and state
  5. safety and correctness
  6. evaluation and observability
  7. cost and latency
  8. Staff-level trade-offs

👉 Interview Answer

For Hallucination Control System, I would start from the traditional system design baseline, then explain what changes when the core capability is powered by an LLM. The traditional parts are still necessary: API gateway, services, storage, cache, queues, scaling, and reliability. The LLM-specific parts are prompt/context construction, model routing, retrieval or memory, tool execution, safety checks, evaluation, token cost, and observability. That is the main upgrade from traditional SD to LLM system design.


2️⃣ Traditional SD Baseline

A traditional correctness system relies on validation, constraints, tests, source-of-truth databases, and deterministic business rules.

A normal system design answer would usually cover:

What is good about this answer?

What is missing for LLM systems?

👉 Interview Answer

A traditional SD answer is still the foundation. But for an LLM system, that answer is incomplete because the model introduces probabilistic behavior, token constraints, quality evaluation, safety concerns, and cost dynamics. So I would keep the distributed systems foundation and add an LLM orchestration layer on top.


3️⃣ LLM System Upgrade

Hallucination control adds retrieval grounding, citations, uncertainty handling, answer verification, constrained decoding, refusal policy, and post-generation validators.

The main upgrade is to add an AI orchestration layer:

Traditional System
  ↓
LLM Orchestration Layer
  ↓
Model / Retrieval / Tools / Memory
  ↓
Validated User-Facing Output

👉 Interview Answer

The important shift is that the model is not the whole system. The model is one execution component behind an orchestration layer. The orchestration layer decides what context enters the prompt, which model is used, which tools are allowed, how outputs are validated, and how quality is measured.


4️⃣ High-Level Architecture

Question arrives
  ↓
Retriever collects evidence
  ↓
Context builder injects sources
  ↓
Model generates answer
  ↓
Claim extractor finds factual claims
  ↓
Citation checker maps claims to evidence
  ↓
Verifier flags unsupported claims
  ↓
Policy decides answer/refusal/escalation
  ↓
Feedback updates eval set

Core components:

Architecture principle:

Separate these layers clearly:

👉 Interview Answer

For Hallucination Control System, I would design a layered architecture. The product layer handles users and requests. The orchestration layer builds context and controls model/tool execution. The model layer generates or reasons. The safety, evaluation, and observability layers make the system production-grade.


5️⃣ Traditional SD Starting Point

Start from the system design answer an interviewer already expects: clients, API gateway, stateless services, storage, cache, queue, scaling, reliability, and observability.

For Hallucination Control System, I would specifically discuss:

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

For Hallucination Control System, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.


6️⃣ What Changes in LLM System Design

Then explain what changes when the core intelligence is a model call: prompts, tokens, context, model routing, retrieval, tools, memory, safety, evaluation, and cost.

For Hallucination Control System, I would specifically discuss:

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

For Hallucination Control System, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.


7️⃣ Request Lifecycle

Trace one request end to end so the design is not abstract. A strong answer shows exactly where context is built, where the model is called, and where outputs are validated.

For Hallucination Control System, I would specifically discuss:

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

For Hallucination Control System, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.


8️⃣ Data and State Model

LLM systems create new state: conversation turns, prompt versions, model versions, retrieval traces, tool calls, memory records, evaluation labels, token usage, and safety decisions.

For Hallucination Control System, I would specifically discuss:

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

For Hallucination Control System, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.


9️⃣ Online Path

The online path is latency-sensitive. It should be bounded, permission-safe, observable, and able to fall back when a model, retriever, or tool fails.

For Hallucination Control System, I would specifically discuss:

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

For Hallucination Control System, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.


10️⃣ Offline Path

The offline path handles indexing, embedding, eval set construction, prompt testing, model rollout, cache warming, analytics, and quality regression checks.

For Hallucination Control System, I would specifically discuss:

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

For Hallucination Control System, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.


11️⃣ Scaling Strategy

Traditional scaling is still necessary, but LLM systems add expensive model execution, token limits, GPU/provider bottlenecks, vector indexes, and tool fan-out.

For Hallucination Control System, I would specifically discuss:

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

For Hallucination Control System, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.


12️⃣ Latency Budget

Break down latency by gateway, orchestration, retrieval, reranking, model inference, tool calls, validation, and streaming response.

For Hallucination Control System, I would specifically discuss:

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

For Hallucination Control System, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.


13️⃣ Reliability and Failure Handling

Failures include normal distributed systems failures plus model-specific failures: hallucination, low confidence, empty retrieval, unsafe tool call, context overflow, and bad prompt version.

For Hallucination Control System, I would specifically discuss:

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

For Hallucination Control System, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.


14️⃣ Security and Privacy

LLM systems must handle prompt injection, data exfiltration, tenant isolation, sensitive context, tool permissions, audit logs, and safe logging.

For Hallucination Control System, I would specifically discuss:

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

For Hallucination Control System, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.


15️⃣ Evaluation and Quality

A Staff-level answer must define quality metrics, offline golden sets, online feedback, regression gates, and rollout/rollback strategy.

For Hallucination Control System, I would specifically discuss:

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

For Hallucination Control System, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.


16️⃣ Cost Control

LLM systems need token accounting, model tiering, cache strategy, batching, context trimming, request budgets, and cost attribution by user/team/feature.

For Hallucination Control System, I would specifically discuss:

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

For Hallucination Control System, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.


17️⃣ Observability

Trace every model call with prompt version, model version, retrieval ids, tool calls, token counts, latency, cost, safety decisions, and final quality signals.

For Hallucination Control System, I would specifically discuss:

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

For Hallucination Control System, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.


18️⃣ Trade-offs

Compare quality, latency, cost, safety, freshness, determinism, explainability, and operational complexity.

For Hallucination Control System, I would specifically discuss:

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

For Hallucination Control System, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.


19️⃣ Staff-Level Framing

The Staff-level answer is not that LLMs are smarter. It is that the system creates deterministic boundaries around probabilistic model behavior.

For Hallucination Control System, I would specifically discuss:

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

For Hallucination Control System, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.


20️⃣ Traditional vs LLM Design Comparison

Area Traditional SD LLM System Design
Core logic deterministic business logic model-assisted probabilistic reasoning
Input structured API payload messages, prompts, retrieved context, tool outputs
State DB rows, cache entries, queue messages conversation, memory, prompt/model/retrieval/tool traces
Correctness unit tests and business rules grounding, evaluation, validation, safety policy
Scaling services, DB, cache, queues plus inference capacity, token budget, vector indexes, tool fan-out
Cost CPU, storage, network plus input/output tokens, model tier, reranking, embeddings
Debugging logs, metrics, traces plus prompt version, model version, context, tool calls, eval labels
Failure timeout, DB error, queue lag plus hallucination, unsafe output, context overflow, low confidence

21️⃣ Common Interview Follow-ups

Q: How do you start the answer?

A: Start with the traditional system baseline, then explicitly state what changes because this is an LLM system.

Q: Where does the LLM fit?

A: Behind an orchestration layer, not directly exposed as the system itself.

Q: How do you control quality?

A: Use offline evals, online feedback, regression gates, groundedness checks, and prompt/model version tracking.

Q: How do you control cost?

A: Track tokens, route models by task complexity, cache safe results, compress context, and enforce budgets.

Q: How do you handle hallucination?

A: Ground the answer when possible, validate claims, add citation checks, use refusal policy, and measure hallucination rate.

Q: How do you debug bad output?

A: Inspect the full trace: user input, context, prompt version, model version, retrieval results, tool calls, safety decisions, and final output.

Q: What is the Staff-level insight?

A: A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.


22️⃣ Answer Bank for Memorization

Memorization Paragraph 1

For Hallucination Control System, I would start with the traditional system design baseline and then upgrade it with LLM-specific components. The traditional baseline gives us API gateway, auth, stateless services, storage, cache, queues, scaling, and reliability. The LLM upgrade adds prompt/context construction, model routing, retrieval or memory, tool execution, safety checks, evaluation, token cost, and model observability.

Memorization Paragraph 2

The key design principle for Hallucination Control System is that the model should not be treated as the whole system. The model is one component behind an orchestration layer. The orchestration layer owns context, permissions, model choice, tool access, validation, cost control, and traces.

Memorization Paragraph 3

Compared with traditional SD, Hallucination Control System introduces probabilistic quality. That means I need both system metrics and quality metrics: latency, error rate, and cost on one side; groundedness, safety, user correction rate, and task success on the other side.

Memorization Paragraph 4

At Staff level, I would separate deterministic guarantees from model behavior. The system must deterministically enforce authorization, budgets, schema validation, logging, and rollback. The model can generate or reason, but the platform must make that behavior bounded and measurable.

Memorization Paragraph 5

I would design Hallucination Control System with both online and offline paths. The online path handles user requests under latency and safety constraints. The offline path builds indexes, eval datasets, prompt tests, analytics, cost dashboards, and regression gates so the system improves without silent quality regression.


23️⃣ Senior / Staff-Level Summary Answer

For Hallucination Control System, I would upgrade a traditional system design answer by adding an LLM orchestration layer. The traditional system still handles API, auth, services, storage, caching, queues, scaling, and reliability. The LLM layer handles prompt/context construction, model routing, retrieval or memory, tool execution, safety, evaluation, token cost, and observability. The Staff-level point is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled through deterministic system boundaries.


中文部分

🎯 Hallucination Control System

这个 topic 的核心是:如何把传统 System Design 答案升级成 LLM System Design 答案。

传统 SD baseline:

A traditional correctness system relies on validation, constraints, tests, source-of-truth databases, and deterministic business rules.

LLM 系统升级点:

Hallucination control adds retrieval grounding, citations, uncertainty handling, answer verification, constrained decoding, refusal policy, and post-generation validators.


1️⃣ 中文核心框架

讨论 Hallucination Control System 时,我会按这个结构回答:

  1. 先讲传统系统设计 baseline
  2. 再讲 LLM 系统新增组件
  3. 画出完整请求链路
  4. 解释 context/state/model/tool/memory
  5. 讲 safety、eval、cost、observability
  6. 最后给 Staff 级 trade-off

中文可背诵回答:

对于 Hallucination Control System,我会先从传统系统设计开始,而不是直接说调用 LLM。 传统部分包括 gateway、service、DB、cache、queue、scaling 和 reliability。 LLM 升级部分包括 prompt、context、model routing、retrieval、memory、tool calling、safety、eval、token cost 和 observability。 Staff 级重点是用确定性的系统边界包住概率性的模型行为。


2️⃣ 传统 SD 起点

先从传统系统设计出发:client、gateway、stateless service、DB、cache、queue、scaling、reliability 和 observability。

Hallucination Control System 中具体要讲:

中文背诵段落:

讲 Hallucination Control System 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。


3️⃣ 升级到 LLM 系统后的变化

LLM 系统新增 prompt、token、context、retrieval、tool、memory、safety、eval 和 cost control。

Hallucination Control System 中具体要讲:

中文背诵段落:

讲 Hallucination Control System 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。


4️⃣ 请求链路

面试里一定要 trace 一次完整请求,说明 context 在哪里构建、model 在哪里调用、output 在哪里验证。

Hallucination Control System 中具体要讲:

中文背诵段落:

讲 Hallucination Control System 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。


5️⃣ 数据与状态模型

LLM 系统新增 conversation、prompt version、model version、retrieval trace、tool call、memory、token usage、safety decision 等状态。

Hallucination Control System 中具体要讲:

中文背诵段落:

讲 Hallucination Control System 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。


6️⃣ Online Path

online path 关注 latency、permission、bounded execution、fallback 和 streaming。

Hallucination Control System 中具体要讲:

中文背诵段落:

讲 Hallucination Control System 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。


7️⃣ Offline Path

offline path 负责 indexing、embedding、eval、prompt test、model rollout、cache warming 和 regression。

Hallucination Control System 中具体要讲:

中文背诵段落:

讲 Hallucination Control System 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。


8️⃣ 扩展策略

传统 horizontal scaling 仍然重要,但 LLM 系统还要处理模型推理成本、token limit、vector index、tool fan-out 和 provider bottleneck。

Hallucination Control System 中具体要讲:

中文背诵段落:

讲 Hallucination Control System 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。


9️⃣ 延迟预算

拆 gateway、orchestrator、retriever、reranker、model inference、tool call、validation 和 streaming latency。

Hallucination Control System 中具体要讲:

中文背诵段落:

讲 Hallucination Control System 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。


10️⃣ 可靠性与失败处理

除了传统 timeout 和 retry,还要处理 hallucination、empty retrieval、context overflow、unsafe tool call、bad prompt version。

Hallucination Control System 中具体要讲:

中文背诵段落:

讲 Hallucination Control System 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。


11️⃣ 安全与隐私

重点讲 prompt injection、防数据泄漏、tenant isolation、tool permission、audit 和 safe logging。

Hallucination Control System 中具体要讲:

中文背诵段落:

讲 Hallucination Control System 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。


12️⃣ 评估体系

Staff 级必须讲 offline golden set、online feedback、quality metrics、regression gate 和 rollback。

Hallucination Control System 中具体要讲:

中文背诵段落:

讲 Hallucination Control System 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。


13️⃣ 成本控制

需要 token accounting、model routing、cache、batching、context trimming、request budget 和 cost attribution。

Hallucination Control System 中具体要讲:

中文背诵段落:

讲 Hallucination Control System 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。


14️⃣ 可观测性

trace model version、prompt version、retrieval ids、tool calls、tokens、latency、cost、safety 和 quality。

Hallucination Control System 中具体要讲:

中文背诵段落:

讲 Hallucination Control System 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。


15️⃣ Trade-off

比较质量、延迟、成本、安全、新鲜度、确定性、可解释性和运维复杂度。

Hallucination Control System 中具体要讲:

中文背诵段落:

讲 Hallucination Control System 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。


16️⃣ Staff 级框架

Staff 级答案的核心是:用确定性的系统边界包住概率性的模型行为。

Hallucination Control System 中具体要讲:

中文背诵段落:

讲 Hallucination Control System 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。


17️⃣ 中文高阶追问

Q: 如何开场?

A: 先说传统 SD baseline,然后说 LLM 系统新增哪些层。

Q: LLM 在架构里是什么位置?

A: LLM 是 orchestration layer 后面的执行组件,不是整个系统。

Q: 如何控制质量?

A: offline eval、online feedback、grounding、safety check、regression gate。

Q: 如何控制成本?

A: token budget、model routing、cache、context compression、batching、cost attribution。

Q: 如何 debug?

A: 看完整 trace:input、context、prompt version、model version、retrieval、tool calls、safety、output。

Q: Staff 级 insight 是什么?

A: A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.


18️⃣ 中文背诵答案库

背诵段落 1

Hallucination Control System 的回答不能停留在传统 SD,也不能只说调用 LLM。我的回答方式是先建立传统系统设计 baseline,然后增加 LLM orchestration layer。传统部分保证服务稳定,LLM 部分负责智能能力,系统边界负责安全、成本、评估和可观测性。

背诵段落 2

传统系统设计主要关注 API、服务、数据库、缓存、队列和扩展。升级到 Hallucination Control System 后,还要关注 prompt version、model version、context window、retrieval、memory、tool calling、safety policy、eval pipeline 和 token cost。

背诵段落 3

Staff 级回答的重点是区分 deterministic system guarantee 和 probabilistic model behavior。权限、预算、schema、audit、rollback 必须由系统确定性保证;模型只负责生成、理解或推理。

背诵段落 4

我会把 Hallucination Control System 拆成 online path 和 offline path。online path 负责用户请求、latency、safety 和 fallback;offline path 负责 index、eval、prompt test、cost analysis、quality regression 和 rollout。

背诵段落 5

如果 Hallucination Control System 出现质量问题,我会通过 trace debug:用户输入是什么、context 是什么、prompt/model 版本是什么、retrieval 和 tool 结果是什么、safety decision 是什么、最终输出为什么通过。


19️⃣ 中文 Staff 总结

Hallucination Control System 的 Staff 级答案,是从传统 SD 过渡到 LLM 系统设计。 传统 SD 提供稳定的分布式系统基础。 LLM 系统新增 prompt、context、model、retrieval、memory、tool、safety、eval、cost 和 observability。 核心不是模型有多聪明,而是系统如何让模型行为可控、可度量、可调试、可回滚。


note:

Hallucination Control System = Traditional SD baseline + LLM orchestration layer + safety/eval/cost/observability.

Question arrives
  ↓
Retriever collects evidence
  ↓
Context builder injects sources
  ↓
Model generates answer
  ↓
Claim extractor finds factual claims
  ↓
Citation checker maps claims to evidence
  ↓
Verifier flags unsupported claims
  ↓
Policy decides answer/refusal/escalation
  ↓
Feedback updates eval set

Remember:

Implement