sd-llm Traditional SD to LLM System Design ·

🎯 AI Agent System Design

1️⃣ Core Upgrade Framework

This topic is about upgrading a traditional system design answer into an LLM system design answer for AI Agent System Design.

Traditional SD baseline:

A traditional workflow engine executes predefined steps with state machines, queues, workers, retries, and audit logs.

LLM system upgrade:

An AI agent system adds model-driven planning, dynamic tool selection, observations, memory, bounded loops, safety gates, and human approval for high-risk actions.

I usually structure the answer around eight dimensions:

traditional baseline
LLM-specific components
request lifecycle
context and state
safety and correctness
evaluation and observability
cost and latency
Staff-level trade-offs

👉 Interview Answer

For AI Agent System Design, I would start from the traditional system design baseline, then explain what changes when the core capability is powered by an LLM. The traditional parts are still necessary: API gateway, services, storage, cache, queues, scaling, and reliability. The LLM-specific parts are prompt/context construction, model routing, retrieval or memory, tool execution, safety checks, evaluation, token cost, and observability. That is the main upgrade from traditional SD to LLM system design.

2️⃣ Traditional SD Baseline

A traditional workflow engine executes predefined steps with state machines, queues, workers, retries, and audit logs.

A normal system design answer would usually cover:

client and API gateway
authentication and authorization
stateless application services
database and schema design
cache layer
message queue for async work
rate limiting and backpressure
horizontal scaling
replication and failover
metrics, logs, and alerts

What is good about this answer?

It is structured.
It covers scale and reliability.
It has clear storage and service boundaries.
It can be implemented with conventional distributed systems patterns.

What is missing for LLM systems?

no prompt/context layer
no model routing
no token budget
no model quality evaluation
no hallucination control
no tool/memory safety boundary
no cost-per-token control
no trace of model behavior

👉 Interview Answer

A traditional SD answer is still the foundation. But for an LLM system, that answer is incomplete because the model introduces probabilistic behavior, token constraints, quality evaluation, safety concerns, and cost dynamics. So I would keep the distributed systems foundation and add an LLM orchestration layer on top.

3️⃣ LLM System Upgrade

An AI agent system adds model-driven planning, dynamic tool selection, observations, memory, bounded loops, safety gates, and human approval for high-risk actions.

The main upgrade is to add an AI orchestration layer:

prompt builder
context manager
model router
retrieval or memory layer
tool-calling layer when needed
safety and policy layer
evaluation layer
usage and cost metering
trace and debugging layer

Traditional System
  ↓
LLM Orchestration Layer
  ↓
Model / Retrieval / Tools / Memory
  ↓
Validated User-Facing Output

👉 Interview Answer

The important shift is that the model is not the whole system. The model is one execution component behind an orchestration layer. The orchestration layer decides what context enters the prompt, which model is used, which tools are allowed, how outputs are validated, and how quality is measured.

4️⃣ High-Level Architecture

User gives goal
  ↓
Agent creates plan
  ↓
Router selects tool
  ↓
Policy engine checks permission
  ↓
Executor calls tool
  ↓
Observation is parsed
  ↓
State is updated
  ↓
Loop controller decides continue/stop
  ↓
Final answer is generated
  ↓
Trace and metrics are stored

Core components:

agent API
planner
state store
tool registry
tool router
executor
observation parser
memory service
policy engine
human approval queue
trace logger
retry/fallback manager

Architecture principle:

Separate these layers clearly:

product/API layer
orchestration layer
model execution layer
retrieval/memory/tool layer
safety/evaluation layer
observability/cost layer

👉 Interview Answer

For AI Agent System Design, I would design a layered architecture. The product layer handles users and requests. The orchestration layer builds context and controls model/tool execution. The model layer generates or reasons. The safety, evaluation, and observability layers make the system production-grade.

5️⃣ Traditional SD Starting Point

Start from the system design answer an interviewer already expects: clients, API gateway, stateless services, storage, cache, queue, scaling, reliability, and observability.

For AI Agent System Design, I would specifically discuss:

API gateway and auth
service decomposition
storage choice
cache strategy
queue and async processing
rate limiting
SLO and failure handling
deployment and rollback

Staff-level detail:

A Staff-level agent design models the agent as a bounded state machine. The model may propose actions, but the system owns permissions, idempotency, budgets, loop termination, approval, and auditability.

Memorize this answer:

For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.

6️⃣ What Changes in LLM System Design

Then explain what changes when the core intelligence is a model call: prompts, tokens, context, model routing, retrieval, tools, memory, safety, evaluation, and cost.

For AI Agent System Design, I would specifically discuss:

prompt versioning
context construction
token budget
model selection
retrieval/memory/tool integration
safety validation
quality evaluation
cost attribution

Staff-level detail:

Memorize this answer:

For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.

7️⃣ Request Lifecycle

Trace one request end to end so the design is not abstract. A strong answer shows exactly where context is built, where the model is called, and where outputs are validated.

For AI Agent System Design, I would specifically discuss:

User gives goal
Agent creates plan
Router selects tool
Policy engine checks permission
Executor calls tool
Observation is parsed
State is updated
Loop controller decides continue/stop
Final answer is generated
Trace and metrics are stored

Staff-level detail:

Memorize this answer:

For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.

8️⃣ Data and State Model

LLM systems create new state: conversation turns, prompt versions, model versions, retrieval traces, tool calls, memory records, evaluation labels, token usage, and safety decisions.

For AI Agent System Design, I would specifically discuss:

request id
user id / tenant id
conversation or task id
prompt version
model version
retrieval ids
tool call ids
memory ids
token counts
safety decision
quality label
cost record

Staff-level detail:

Memorize this answer:

For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.

9️⃣ Online Path

The online path is latency-sensitive. It should be bounded, permission-safe, observable, and able to fall back when a model, retriever, or tool fails.

For AI Agent System Design, I would specifically discuss:

explicit ownership
bounded execution
safe defaults
versioned behavior
measurable quality
debuggable traces
fallback path
operational readiness

Staff-level detail:

Memorize this answer:

For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.

10️⃣ Offline Path

The offline path handles indexing, embedding, eval set construction, prompt testing, model rollout, cache warming, analytics, and quality regression checks.

For AI Agent System Design, I would specifically discuss:

explicit ownership
bounded execution
safe defaults
versioned behavior
measurable quality
debuggable traces
fallback path
operational readiness

Staff-level detail:

Memorize this answer:

For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.

11️⃣ Scaling Strategy

Traditional scaling is still necessary, but LLM systems add expensive model execution, token limits, GPU/provider bottlenecks, vector indexes, and tool fan-out.

For AI Agent System Design, I would specifically discuss:

stateless service horizontal scaling
queue-based async processing
cache hot paths
model-provider concurrency limits
GPU or inference capacity
vector index sharding
tool fan-out control
backpressure and admission control

Staff-level detail:

Memorize this answer:

For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.

12️⃣ Latency Budget

Break down latency by gateway, orchestration, retrieval, reranking, model inference, tool calls, validation, and streaming response.

For AI Agent System Design, I would specifically discuss:

gateway/auth latency
context build latency
retrieval latency
reranking latency
model time to first token
model total generation time
tool call latency
safety validation latency
streaming response behavior

Staff-level detail:

Memorize this answer:

For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.

13️⃣ Reliability and Failure Handling

Failures include normal distributed systems failures plus model-specific failures: hallucination, low confidence, empty retrieval, unsafe tool call, context overflow, and bad prompt version.

For AI Agent System Design, I would specifically discuss:

infinite loop
wrong tool call
unsafe side effect
non-idempotent retry
state drift
tool hallucination
excessive cost
low-quality plan

Staff-level detail:

Memorize this answer:

For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.

14️⃣ Security and Privacy

LLM systems must handle prompt injection, data exfiltration, tenant isolation, sensitive context, tool permissions, audit logs, and safe logging.

For AI Agent System Design, I would specifically discuss:

tenant isolation
least privilege tools
prompt injection defense
sensitive data redaction
safe logging
audit trails
data retention
approval for side effects

Staff-level detail:

Memorize this answer:

For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.

15️⃣ Evaluation and Quality

A Staff-level answer must define quality metrics, offline golden sets, online feedback, regression gates, and rollout/rollback strategy.

For AI Agent System Design, I would specifically discuss:

task completion rate
steps per task
tool success rate
loop timeout rate
human escalation rate
unsafe tool block rate
cost per completed task
state drift incidents
offline golden set
online feedback
regression gate
canary rollout

Staff-level detail:

Memorize this answer:

For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.

16️⃣ Cost Control

LLM systems need token accounting, model tiering, cache strategy, batching, context trimming, request budgets, and cost attribution by user/team/feature.

For AI Agent System Design, I would specifically discuss:

token accounting
model tiering
semantic cache
prompt compression
output token limits
batching
provider routing
budget alerts

Staff-level detail:

Memorize this answer:

For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.

17️⃣ Observability

Trace every model call with prompt version, model version, retrieval ids, tool calls, token counts, latency, cost, safety decisions, and final quality signals.

For AI Agent System Design, I would specifically discuss:

request trace
prompt/model versions
retrieval trace
tool trace
token counts
latency breakdown
cost breakdown
safety decisions
quality feedback

Staff-level detail:

Memorize this answer:

For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.

18️⃣ Trade-offs

Compare quality, latency, cost, safety, freshness, determinism, explainability, and operational complexity.

For AI Agent System Design, I would specifically discuss:

quality vs latency
accuracy vs cost
freshness vs stability
automation vs control
model flexibility vs deterministic rules
context richness vs token budget
privacy vs personalization
speed vs verification

Staff-level detail:

Memorize this answer:

For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.

19️⃣ Staff-Level Framing

The Staff-level answer is not that LLMs are smarter. It is that the system creates deterministic boundaries around probabilistic model behavior.

For AI Agent System Design, I would specifically discuss:

separate deterministic guarantees from probabilistic output
define system-owned boundaries
make quality measurable
make cost attributable
make failures recoverable
make rollouts reversible
make traces debuggable
avoid demo-only architecture

Staff-level detail:

Memorize this answer:

For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.

20️⃣ Traditional vs LLM Design Comparison

Area	Traditional SD	LLM System Design
Core logic	deterministic business logic	model-assisted probabilistic reasoning
Input	structured API payload	messages, prompts, retrieved context, tool outputs
State	DB rows, cache entries, queue messages	conversation, memory, prompt/model/retrieval/tool traces
Correctness	unit tests and business rules	grounding, evaluation, validation, safety policy
Scaling	services, DB, cache, queues	plus inference capacity, token budget, vector indexes, tool fan-out
Cost	CPU, storage, network	plus input/output tokens, model tier, reranking, embeddings
Debugging	logs, metrics, traces	plus prompt version, model version, context, tool calls, eval labels
Failure	timeout, DB error, queue lag	plus hallucination, unsafe output, context overflow, low confidence

21️⃣ Common Interview Follow-ups

Q: How do you start the answer?

A: Start with the traditional system baseline, then explicitly state what changes because this is an LLM system.

Q: Where does the LLM fit?

A: Behind an orchestration layer, not directly exposed as the system itself.

Q: How do you control quality?

A: Use offline evals, online feedback, regression gates, groundedness checks, and prompt/model version tracking.

Q: How do you control cost?

A: Track tokens, route models by task complexity, cache safe results, compress context, and enforce budgets.

Q: How do you handle hallucination?

A: Ground the answer when possible, validate claims, add citation checks, use refusal policy, and measure hallucination rate.

Q: How do you debug bad output?

A: Inspect the full trace: user input, context, prompt version, model version, retrieval results, tool calls, safety decisions, and final output.

Q: What is the Staff-level insight?

A: A Staff-level agent design models the agent as a bounded state machine. The model may propose actions, but the system owns permissions, idempotency, budgets, loop termination, approval, and auditability.

22️⃣ Answer Bank for Memorization

Memorization Paragraph 1

For AI Agent System Design, I would start with the traditional system design baseline and then upgrade it with LLM-specific components. The traditional baseline gives us API gateway, auth, stateless services, storage, cache, queues, scaling, and reliability. The LLM upgrade adds prompt/context construction, model routing, retrieval or memory, tool execution, safety checks, evaluation, token cost, and model observability.

Memorization Paragraph 2

The key design principle for AI Agent System Design is that the model should not be treated as the whole system. The model is one component behind an orchestration layer. The orchestration layer owns context, permissions, model choice, tool access, validation, cost control, and traces.

Memorization Paragraph 3

Compared with traditional SD, AI Agent System Design introduces probabilistic quality. That means I need both system metrics and quality metrics: latency, error rate, and cost on one side; groundedness, safety, user correction rate, and task success on the other side.

Memorization Paragraph 4

At Staff level, I would separate deterministic guarantees from model behavior. The system must deterministically enforce authorization, budgets, schema validation, logging, and rollback. The model can generate or reason, but the platform must make that behavior bounded and measurable.

Memorization Paragraph 5

I would design AI Agent System Design with both online and offline paths. The online path handles user requests under latency and safety constraints. The offline path builds indexes, eval datasets, prompt tests, analytics, cost dashboards, and regression gates so the system improves without silent quality regression.

23️⃣ Senior / Staff-Level Summary Answer

For AI Agent System Design, I would upgrade a traditional system design answer by adding an LLM orchestration layer. The traditional system still handles API, auth, services, storage, caching, queues, scaling, and reliability. The LLM layer handles prompt/context construction, model routing, retrieval or memory, tool execution, safety, evaluation, token cost, and observability. The Staff-level point is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled through deterministic system boundaries.

中文部分

🎯 AI Agent System Design

这个 topic 的核心是：如何把传统 System Design 答案升级成 LLM System Design 答案。

传统 SD baseline：

A traditional workflow engine executes predefined steps with state machines, queues, workers, retries, and audit logs.

LLM 系统升级点：

An AI agent system adds model-driven planning, dynamic tool selection, observations, memory, bounded loops, safety gates, and human approval for high-risk actions.

1️⃣ 中文核心框架

讨论 AI Agent System Design 时，我会按这个结构回答：

先讲传统系统设计 baseline
再讲 LLM 系统新增组件
画出完整请求链路
解释 context/state/model/tool/memory
讲 safety、eval、cost、observability
最后给 Staff 级 trade-off

中文可背诵回答：

对于 AI Agent System Design，我会先从传统系统设计开始，而不是直接说调用 LLM。传统部分包括 gateway、service、DB、cache、queue、scaling 和 reliability。 LLM 升级部分包括 prompt、context、model routing、retrieval、memory、tool calling、safety、eval、token cost 和 observability。 Staff 级重点是用确定性的系统边界包住概率性的模型行为。

2️⃣ 传统 SD 起点

先从传统系统设计出发：client、gateway、stateless service、DB、cache、queue、scaling、reliability 和 observability。

在 AI Agent System Design 中具体要讲：

API gateway
auth / rate limit
stateless services
DB / cache / queue
replication / failover
SLO / alerting

中文背诵段落：

讲 AI Agent System Design 时，我会明确区分传统系统部分和 LLM 特有部分。传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。

3️⃣ 升级到 LLM 系统后的变化

LLM 系统新增 prompt、token、context、retrieval、tool、memory、safety、eval 和 cost control。

在 AI Agent System Design 中具体要讲：

prompt builder
context manager
model router
retrieval / memory
tool calling
safety validation
eval pipeline
token cost control

中文背诵段落：

讲 AI Agent System Design 时，我会明确区分传统系统部分和 LLM 特有部分。传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。

4️⃣ 请求链路

面试里一定要 trace 一次完整请求，说明 context 在哪里构建、model 在哪里调用、output 在哪里验证。

在 AI Agent System Design 中具体要讲：

User gives goal
Agent creates plan
Router selects tool
Policy engine checks permission
Executor calls tool
Observation is parsed
State is updated
Loop controller decides continue/stop
Final answer is generated
Trace and metrics are stored

中文背诵段落：

讲 AI Agent System Design 时，我会明确区分传统系统部分和 LLM 特有部分。传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。

5️⃣ 数据与状态模型

LLM 系统新增 conversation、prompt version、model version、retrieval trace、tool call、memory、token usage、safety decision 等状态。

在 AI Agent System Design 中具体要讲：

request id
tenant / user id
conversation / task id
prompt version
model version
retrieval trace
tool call trace
token usage
cost record
safety decision

中文背诵段落：

讲 AI Agent System Design 时，我会明确区分传统系统部分和 LLM 特有部分。传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。

6️⃣ Online Path

online path 关注 latency、permission、bounded execution、fallback 和 streaming。

在 AI Agent System Design 中具体要讲：

明确边界
可度量质量
可追踪链路
可回滚版本
可控成本
可处理失败
权限安全
生产可运维

中文背诵段落：

讲 AI Agent System Design 时，我会明确区分传统系统部分和 LLM 特有部分。传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。

7️⃣ Offline Path

offline path 负责 indexing、embedding、eval、prompt test、model rollout、cache warming 和 regression。

在 AI Agent System Design 中具体要讲：

明确边界
可度量质量
可追踪链路
可回滚版本
可控成本
可处理失败
权限安全
生产可运维

中文背诵段落：

讲 AI Agent System Design 时，我会明确区分传统系统部分和 LLM 特有部分。传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。

8️⃣ 扩展策略

传统 horizontal scaling 仍然重要，但 LLM 系统还要处理模型推理成本、token limit、vector index、tool fan-out 和 provider bottleneck。

在 AI Agent System Design 中具体要讲：

明确边界
可度量质量
可追踪链路
可回滚版本
可控成本
可处理失败
权限安全
生产可运维

中文背诵段落：

讲 AI Agent System Design 时，我会明确区分传统系统部分和 LLM 特有部分。传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。

9️⃣ 延迟预算

拆 gateway、orchestrator、retriever、reranker、model inference、tool call、validation 和 streaming latency。

在 AI Agent System Design 中具体要讲：

明确边界
可度量质量
可追踪链路
可回滚版本
可控成本
可处理失败
权限安全
生产可运维

中文背诵段落：

讲 AI Agent System Design 时，我会明确区分传统系统部分和 LLM 特有部分。传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。

10️⃣ 可靠性与失败处理

除了传统 timeout 和 retry，还要处理 hallucination、empty retrieval、context overflow、unsafe tool call、bad prompt version。

在 AI Agent System Design 中具体要讲：

infinite loop
wrong tool call
unsafe side effect
non-idempotent retry
state drift
tool hallucination
excessive cost
low-quality plan

中文背诵段落：

讲 AI Agent System Design 时，我会明确区分传统系统部分和 LLM 特有部分。传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。

11️⃣ 安全与隐私

重点讲 prompt injection、防数据泄漏、tenant isolation、tool permission、audit 和 safe logging。

在 AI Agent System Design 中具体要讲：

明确边界
可度量质量
可追踪链路
可回滚版本
可控成本
可处理失败
权限安全
生产可运维

中文背诵段落：

讲 AI Agent System Design 时，我会明确区分传统系统部分和 LLM 特有部分。传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。

12️⃣ 评估体系

Staff 级必须讲 offline golden set、online feedback、quality metrics、regression gate 和 rollback。

在 AI Agent System Design 中具体要讲：

task completion rate
steps per task
tool success rate
loop timeout rate
human escalation rate
unsafe tool block rate
cost per completed task
state drift incidents
golden set
regression gate
canary rollout
online feedback

中文背诵段落：

讲 AI Agent System Design 时，我会明确区分传统系统部分和 LLM 特有部分。传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。

13️⃣ 成本控制

需要 token accounting、model routing、cache、batching、context trimming、request budget 和 cost attribution。

在 AI Agent System Design 中具体要讲：

token accounting
model tiering
semantic cache
context compression
output token limit
batching
budget alert
cost attribution

中文背诵段落：

讲 AI Agent System Design 时，我会明确区分传统系统部分和 LLM 特有部分。传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。

14️⃣ 可观测性

trace model version、prompt version、retrieval ids、tool calls、tokens、latency、cost、safety 和 quality。

在 AI Agent System Design 中具体要讲：

明确边界
可度量质量
可追踪链路
可回滚版本
可控成本
可处理失败
权限安全
生产可运维

中文背诵段落：

讲 AI Agent System Design 时，我会明确区分传统系统部分和 LLM 特有部分。传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。

15️⃣ Trade-off

比较质量、延迟、成本、安全、新鲜度、确定性、可解释性和运维复杂度。

在 AI Agent System Design 中具体要讲：

质量 vs 延迟
准确率 vs 成本
自动化 vs 控制
context 丰富度 vs token budget
个性化 vs 隐私
速度 vs 验证
灵活性 vs 确定性

中文背诵段落：

讲 AI Agent System Design 时，我会明确区分传统系统部分和 LLM 特有部分。传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。

16️⃣ Staff 级框架

Staff 级答案的核心是：用确定性的系统边界包住概率性的模型行为。

在 AI Agent System Design 中具体要讲：

明确边界
可度量质量
可追踪链路
可回滚版本
可控成本
可处理失败
权限安全
生产可运维

中文背诵段落：

讲 AI Agent System Design 时，我会明确区分传统系统部分和 LLM 特有部分。传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。

17️⃣ 中文高阶追问

Q: 如何开场？

A: 先说传统 SD baseline，然后说 LLM 系统新增哪些层。

Q: LLM 在架构里是什么位置？

A: LLM 是 orchestration layer 后面的执行组件，不是整个系统。

Q: 如何控制质量？

A: offline eval、online feedback、grounding、safety check、regression gate。

Q: 如何控制成本？

A: token budget、model routing、cache、context compression、batching、cost attribution。

Q: 如何 debug？

A: 看完整 trace：input、context、prompt version、model version、retrieval、tool calls、safety、output。

Q: Staff 级 insight 是什么？

18️⃣ 中文背诵答案库

背诵段落 1

AI Agent System Design 的回答不能停留在传统 SD，也不能只说调用 LLM。我的回答方式是先建立传统系统设计 baseline，然后增加 LLM orchestration layer。传统部分保证服务稳定，LLM 部分负责智能能力，系统边界负责安全、成本、评估和可观测性。

背诵段落 2

传统系统设计主要关注 API、服务、数据库、缓存、队列和扩展。升级到 AI Agent System Design 后，还要关注 prompt version、model version、context window、retrieval、memory、tool calling、safety policy、eval pipeline 和 token cost。

背诵段落 3

Staff 级回答的重点是区分 deterministic system guarantee 和 probabilistic model behavior。权限、预算、schema、audit、rollback 必须由系统确定性保证；模型只负责生成、理解或推理。

背诵段落 4

我会把 AI Agent System Design 拆成 online path 和 offline path。online path 负责用户请求、latency、safety 和 fallback；offline path 负责 index、eval、prompt test、cost analysis、quality regression 和 rollout。

背诵段落 5

如果 AI Agent System Design 出现质量问题，我会通过 trace debug：用户输入是什么、context 是什么、prompt/model 版本是什么、retrieval 和 tool 结果是什么、safety decision 是什么、最终输出为什么通过。

19️⃣ 中文 Staff 总结

AI Agent System Design 的 Staff 级答案，是从传统 SD 过渡到 LLM 系统设计。传统 SD 提供稳定的分布式系统基础。 LLM 系统新增 prompt、context、model、retrieval、memory、tool、safety、eval、cost 和 observability。核心不是模型有多聪明，而是系统如何让模型行为可控、可度量、可调试、可回滚。

note:

AI Agent System Design = Traditional SD baseline + LLM orchestration layer + safety/eval/cost/observability.

User gives goal
  ↓
Agent creates plan
  ↓
Router selects tool
  ↓
Policy engine checks permission
  ↓
Executor calls tool
  ↓
Observation is parsed
  ↓
State is updated
  ↓
Loop controller decides continue/stop
  ↓
Final answer is generated
  ↓
Trace and metrics are stored

Remember:

Start from traditional SD.
Add LLM-specific layers.
Trace one request end to end.
Separate deterministic guarantees from probabilistic model behavior.
End with Staff-level trade-offs.

🎯 AI Agent System Design

1️⃣ Core Upgrade Framework

2️⃣ Traditional SD Baseline

3️⃣ LLM System Upgrade

4️⃣ High-Level Architecture

5️⃣ Traditional SD Starting Point

6️⃣ What Changes in LLM System Design

7️⃣ Request Lifecycle

8️⃣ Data and State Model

9️⃣ Online Path

10️⃣ Offline Path

11️⃣ Scaling Strategy

12️⃣ Latency Budget

13️⃣ Reliability and Failure Handling

14️⃣ Security and Privacy

15️⃣ Evaluation and Quality

16️⃣ Cost Control

17️⃣ Observability

18️⃣ Trade-offs

19️⃣ Staff-Level Framing

20️⃣ Traditional vs LLM Design Comparison

21️⃣ Common Interview Follow-ups

Q: How do you start the answer?

Q: Where does the LLM fit?

Q: How do you control quality?

Q: How do you control cost?

Q: How do you handle hallucination?

Q: How do you debug bad output?

Q: What is the Staff-level insight?

22️⃣ Answer Bank for Memorization

Memorization Paragraph 1

Memorization Paragraph 2

Memorization Paragraph 3

Memorization Paragraph 4

Memorization Paragraph 5

23️⃣ Senior / Staff-Level Summary Answer

中文部分

🎯 AI Agent System Design

1️⃣ 中文核心框架

2️⃣ 传统 SD 起点

3️⃣ 升级到 LLM 系统后的变化

4️⃣ 请求链路

5️⃣ 数据与状态模型

6️⃣ Online Path

7️⃣ Offline Path

8️⃣ 扩展策略

9️⃣ 延迟预算

10️⃣ 可靠性与失败处理

11️⃣ 安全与隐私

12️⃣ 评估体系

13️⃣ 成本控制

14️⃣ 可观测性

15️⃣ Trade-off

16️⃣ Staff 级框架

17️⃣ 中文高阶追问

Q: 如何开场？

Q: LLM 在架构里是什么位置？

Q: 如何控制质量？

Q: 如何控制成本？

Q: 如何 debug？

Q: Staff 级 insight 是什么？

18️⃣ 中文背诵答案库

背诵段落 1

背诵段落 2

背诵段落 3

背诵段落 4

背诵段落 5

19️⃣ 中文 Staff 总结

note:

Implement