sd-llm Traditional SD to LLM System Design ·

🎯 Hallucination Control System

1️⃣ Core Upgrade Framework

This topic is about upgrading a traditional system design answer into an LLM system design answer for Hallucination Control System.

Traditional SD baseline:

A traditional correctness system relies on validation, constraints, tests, source-of-truth databases, and deterministic business rules.

LLM system upgrade:

Hallucination control adds retrieval grounding, citations, uncertainty handling, answer verification, constrained decoding, refusal policy, and post-generation validators.

I usually structure the answer around eight dimensions:

traditional baseline
LLM-specific components
request lifecycle
context and state
safety and correctness
evaluation and observability
cost and latency
Staff-level trade-offs

👉 Interview Answer

2️⃣ Traditional SD Baseline

A traditional correctness system relies on validation, constraints, tests, source-of-truth databases, and deterministic business rules.

A normal system design answer would usually cover:

client and API gateway
authentication and authorization
stateless application services
database and schema design
cache layer
message queue for async work
rate limiting and backpressure
horizontal scaling
replication and failover
metrics, logs, and alerts

What is good about this answer?

It is structured.
It covers scale and reliability.
It has clear storage and service boundaries.
It can be implemented with conventional distributed systems patterns.

What is missing for LLM systems?

no prompt/context layer
no model routing
no token budget
no model quality evaluation
no hallucination control
no tool/memory safety boundary
no cost-per-token control
no trace of model behavior

👉 Interview Answer

3️⃣ LLM System Upgrade

Hallucination control adds retrieval grounding, citations, uncertainty handling, answer verification, constrained decoding, refusal policy, and post-generation validators.

The main upgrade is to add an AI orchestration layer:

prompt builder
context manager
model router
retrieval or memory layer
tool-calling layer when needed
safety and policy layer
evaluation layer
usage and cost metering
trace and debugging layer

Traditional System
  ↓
LLM Orchestration Layer
  ↓
Model / Retrieval / Tools / Memory
  ↓
Validated User-Facing Output

👉 Interview Answer

4️⃣ High-Level Architecture

Question arrives
  ↓
Retriever collects evidence
  ↓
Context builder injects sources
  ↓
Model generates answer
  ↓
Claim extractor finds factual claims
  ↓
Citation checker maps claims to evidence
  ↓
Verifier flags unsupported claims
  ↓
Policy decides answer/refusal/escalation
  ↓
Feedback updates eval set

Core components:

grounding retriever
evidence store
citation checker
claim extractor
verifier model
rule validator
confidence scorer
refusal policy
human review queue
feedback loop
eval dataset

Architecture principle:

Separate these layers clearly:

product/API layer
orchestration layer
model execution layer
retrieval/memory/tool layer
safety/evaluation layer
observability/cost layer

👉 Interview Answer

5️⃣ Traditional SD Starting Point

Start from the system design answer an interviewer already expects: clients, API gateway, stateless services, storage, cache, queue, scaling, reliability, and observability.

For Hallucination Control System, I would specifically discuss:

API gateway and auth
service decomposition
storage choice
cache strategy
queue and async processing
rate limiting
SLO and failure handling
deployment and rollback

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

6️⃣ What Changes in LLM System Design

Then explain what changes when the core intelligence is a model call: prompts, tokens, context, model routing, retrieval, tools, memory, safety, evaluation, and cost.

For Hallucination Control System, I would specifically discuss:

prompt versioning
context construction
token budget
model selection
retrieval/memory/tool integration
safety validation
quality evaluation
cost attribution

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

7️⃣ Request Lifecycle

Trace one request end to end so the design is not abstract. A strong answer shows exactly where context is built, where the model is called, and where outputs are validated.

For Hallucination Control System, I would specifically discuss:

Question arrives
Retriever collects evidence
Context builder injects sources
Model generates answer
Claim extractor finds factual claims
Citation checker maps claims to evidence
Verifier flags unsupported claims
Policy decides answer/refusal/escalation
Feedback updates eval set

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

8️⃣ Data and State Model

LLM systems create new state: conversation turns, prompt versions, model versions, retrieval traces, tool calls, memory records, evaluation labels, token usage, and safety decisions.

For Hallucination Control System, I would specifically discuss:

request id
user id / tenant id
conversation or task id
prompt version
model version
retrieval ids
tool call ids
memory ids
token counts
safety decision
quality label
cost record

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

9️⃣ Online Path

The online path is latency-sensitive. It should be bounded, permission-safe, observable, and able to fall back when a model, retriever, or tool fails.

For Hallucination Control System, I would specifically discuss:

explicit ownership
bounded execution
safe defaults
versioned behavior
measurable quality
debuggable traces
fallback path
operational readiness

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

10️⃣ Offline Path

The offline path handles indexing, embedding, eval set construction, prompt testing, model rollout, cache warming, analytics, and quality regression checks.

For Hallucination Control System, I would specifically discuss:

explicit ownership
bounded execution
safe defaults
versioned behavior
measurable quality
debuggable traces
fallback path
operational readiness

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

11️⃣ Scaling Strategy

Traditional scaling is still necessary, but LLM systems add expensive model execution, token limits, GPU/provider bottlenecks, vector indexes, and tool fan-out.

For Hallucination Control System, I would specifically discuss:

stateless service horizontal scaling
queue-based async processing
cache hot paths
model-provider concurrency limits
GPU or inference capacity
vector index sharding
tool fan-out control
backpressure and admission control

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

12️⃣ Latency Budget

Break down latency by gateway, orchestration, retrieval, reranking, model inference, tool calls, validation, and streaming response.

For Hallucination Control System, I would specifically discuss:

gateway/auth latency
context build latency
retrieval latency
reranking latency
model time to first token
model total generation time
tool call latency
safety validation latency
streaming response behavior

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

13️⃣ Reliability and Failure Handling

Failures include normal distributed systems failures plus model-specific failures: hallucination, low confidence, empty retrieval, unsafe tool call, context overflow, and bad prompt version.

For Hallucination Control System, I would specifically discuss:

unsupported claims
fake citations
over-refusal
low recall evidence
verifier false negatives
stale knowledge
prompt injection
unsafe confidence

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

14️⃣ Security and Privacy

LLM systems must handle prompt injection, data exfiltration, tenant isolation, sensitive context, tool permissions, audit logs, and safe logging.

For Hallucination Control System, I would specifically discuss:

tenant isolation
least privilege tools
prompt injection defense
sensitive data redaction
safe logging
audit trails
data retention
approval for side effects

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

15️⃣ Evaluation and Quality

A Staff-level answer must define quality metrics, offline golden sets, online feedback, regression gates, and rollout/rollback strategy.

For Hallucination Control System, I would specifically discuss:

hallucination rate
unsupported claim rate
citation precision
citation recall
refusal accuracy
false refusal rate
user correction rate
grounded answer rate
offline golden set
online feedback
regression gate
canary rollout

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

16️⃣ Cost Control

LLM systems need token accounting, model tiering, cache strategy, batching, context trimming, request budgets, and cost attribution by user/team/feature.

For Hallucination Control System, I would specifically discuss:

token accounting
model tiering
semantic cache
prompt compression
output token limits
batching
provider routing
budget alerts

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

17️⃣ Observability

Trace every model call with prompt version, model version, retrieval ids, tool calls, token counts, latency, cost, safety decisions, and final quality signals.

For Hallucination Control System, I would specifically discuss:

request trace
prompt/model versions
retrieval trace
tool trace
token counts
latency breakdown
cost breakdown
safety decisions
quality feedback

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

18️⃣ Trade-offs

Compare quality, latency, cost, safety, freshness, determinism, explainability, and operational complexity.

For Hallucination Control System, I would specifically discuss:

quality vs latency
accuracy vs cost
freshness vs stability
automation vs control
model flexibility vs deterministic rules
context richness vs token budget
privacy vs personalization
speed vs verification

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

19️⃣ Staff-Level Framing

The Staff-level answer is not that LLMs are smarter. It is that the system creates deterministic boundaries around probabilistic model behavior.

For Hallucination Control System, I would specifically discuss:

separate deterministic guarantees from probabilistic output
define system-owned boundaries
make quality measurable
make cost attributable
make failures recoverable
make rollouts reversible
make traces debuggable
avoid demo-only architecture

Staff-level detail:

A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

Memorize this answer:

20️⃣ Traditional vs LLM Design Comparison

Area	Traditional SD	LLM System Design
Core logic	deterministic business logic	model-assisted probabilistic reasoning
Input	structured API payload	messages, prompts, retrieved context, tool outputs
State	DB rows, cache entries, queue messages	conversation, memory, prompt/model/retrieval/tool traces
Correctness	unit tests and business rules	grounding, evaluation, validation, safety policy
Scaling	services, DB, cache, queues	plus inference capacity, token budget, vector indexes, tool fan-out
Cost	CPU, storage, network	plus input/output tokens, model tier, reranking, embeddings
Debugging	logs, metrics, traces	plus prompt version, model version, context, tool calls, eval labels
Failure	timeout, DB error, queue lag	plus hallucination, unsafe output, context overflow, low confidence

21️⃣ Common Interview Follow-ups

Q: How do you start the answer?

A: Start with the traditional system baseline, then explicitly state what changes because this is an LLM system.

Q: Where does the LLM fit?

A: Behind an orchestration layer, not directly exposed as the system itself.

Q: How do you control quality?

A: Use offline evals, online feedback, regression gates, groundedness checks, and prompt/model version tracking.

Q: How do you control cost?

A: Track tokens, route models by task complexity, cache safe results, compress context, and enforce budgets.

Q: How do you handle hallucination?

A: Ground the answer when possible, validate claims, add citation checks, use refusal policy, and measure hallucination rate.

Q: How do you debug bad output?

A: Inspect the full trace: user input, context, prompt version, model version, retrieval results, tool calls, safety decisions, and final output.

Q: What is the Staff-level insight?

A: A Staff-level hallucination control design does not promise zero hallucination. It defines risk classes, evidence requirements, validation gates, refusal behavior, and measurable quality targets.

22️⃣ Answer Bank for Memorization

Memorization Paragraph 1

Memorization Paragraph 2

Memorization Paragraph 3

Memorization Paragraph 4

Memorization Paragraph 5

23️⃣ Senior / Staff-Level Summary Answer

中文部分

🎯 Hallucination Control System

这个 topic 的核心是：如何把传统 System Design 答案升级成 LLM System Design 答案。

传统 SD baseline：

A traditional correctness system relies on validation, constraints, tests, source-of-truth databases, and deterministic business rules.

LLM 系统升级点：

Hallucination control adds retrieval grounding, citations, uncertainty handling, answer verification, constrained decoding, refusal policy, and post-generation validators.

1️⃣ 中文核心框架

讨论 Hallucination Control System 时，我会按这个结构回答：

先讲传统系统设计 baseline
再讲 LLM 系统新增组件
画出完整请求链路
解释 context/state/model/tool/memory
讲 safety、eval、cost、observability
最后给 Staff 级 trade-off

中文可背诵回答：