🎯 AI Agent System Design
1️⃣ Core Upgrade Framework
This topic is about upgrading a traditional system design answer into an LLM system design answer for AI Agent System Design.
Traditional SD baseline:
A traditional workflow engine executes predefined steps with state machines, queues, workers, retries, and audit logs.
LLM system upgrade:
An AI agent system adds model-driven planning, dynamic tool selection, observations, memory, bounded loops, safety gates, and human approval for high-risk actions.
I usually structure the answer around eight dimensions:
- traditional baseline
- LLM-specific components
- request lifecycle
- context and state
- safety and correctness
- evaluation and observability
- cost and latency
- Staff-level trade-offs
👉 Interview Answer
For AI Agent System Design, I would start from the traditional system design baseline, then explain what changes when the core capability is powered by an LLM. The traditional parts are still necessary: API gateway, services, storage, cache, queues, scaling, and reliability. The LLM-specific parts are prompt/context construction, model routing, retrieval or memory, tool execution, safety checks, evaluation, token cost, and observability. That is the main upgrade from traditional SD to LLM system design.
2️⃣ Traditional SD Baseline
A traditional workflow engine executes predefined steps with state machines, queues, workers, retries, and audit logs.
A normal system design answer would usually cover:
- client and API gateway
- authentication and authorization
- stateless application services
- database and schema design
- cache layer
- message queue for async work
- rate limiting and backpressure
- horizontal scaling
- replication and failover
- metrics, logs, and alerts
What is good about this answer?
- It is structured.
- It covers scale and reliability.
- It has clear storage and service boundaries.
- It can be implemented with conventional distributed systems patterns.
What is missing for LLM systems?
- no prompt/context layer
- no model routing
- no token budget
- no model quality evaluation
- no hallucination control
- no tool/memory safety boundary
- no cost-per-token control
- no trace of model behavior
👉 Interview Answer
A traditional SD answer is still the foundation. But for an LLM system, that answer is incomplete because the model introduces probabilistic behavior, token constraints, quality evaluation, safety concerns, and cost dynamics. So I would keep the distributed systems foundation and add an LLM orchestration layer on top.
3️⃣ LLM System Upgrade
An AI agent system adds model-driven planning, dynamic tool selection, observations, memory, bounded loops, safety gates, and human approval for high-risk actions.
The main upgrade is to add an AI orchestration layer:
- prompt builder
- context manager
- model router
- retrieval or memory layer
- tool-calling layer when needed
- safety and policy layer
- evaluation layer
- usage and cost metering
- trace and debugging layer
Traditional System
↓
LLM Orchestration Layer
↓
Model / Retrieval / Tools / Memory
↓
Validated User-Facing Output
👉 Interview Answer
The important shift is that the model is not the whole system. The model is one execution component behind an orchestration layer. The orchestration layer decides what context enters the prompt, which model is used, which tools are allowed, how outputs are validated, and how quality is measured.
4️⃣ High-Level Architecture
User gives goal
↓
Agent creates plan
↓
Router selects tool
↓
Policy engine checks permission
↓
Executor calls tool
↓
Observation is parsed
↓
State is updated
↓
Loop controller decides continue/stop
↓
Final answer is generated
↓
Trace and metrics are stored
Core components:
- agent API
- planner
- state store
- tool registry
- tool router
- executor
- observation parser
- memory service
- policy engine
- human approval queue
- trace logger
- retry/fallback manager
Architecture principle:
Separate these layers clearly:
- product/API layer
- orchestration layer
- model execution layer
- retrieval/memory/tool layer
- safety/evaluation layer
- observability/cost layer
👉 Interview Answer
For AI Agent System Design, I would design a layered architecture. The product layer handles users and requests. The orchestration layer builds context and controls model/tool execution. The model layer generates or reasons. The safety, evaluation, and observability layers make the system production-grade.
5️⃣ Traditional SD Starting Point
Start from the system design answer an interviewer already expects: clients, API gateway, stateless services, storage, cache, queue, scaling, reliability, and observability.
For AI Agent System Design, I would specifically discuss:
- API gateway and auth
- service decomposition
- storage choice
- cache strategy
- queue and async processing
- rate limiting
- SLO and failure handling
- deployment and rollback
Staff-level detail:
A Staff-level agent design models the agent as a bounded state machine. The model may propose actions, but the system owns permissions, idempotency, budgets, loop termination, approval, and auditability.
Memorize this answer:
For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.
6️⃣ What Changes in LLM System Design
Then explain what changes when the core intelligence is a model call: prompts, tokens, context, model routing, retrieval, tools, memory, safety, evaluation, and cost.
For AI Agent System Design, I would specifically discuss:
- prompt versioning
- context construction
- token budget
- model selection
- retrieval/memory/tool integration
- safety validation
- quality evaluation
- cost attribution
Staff-level detail:
A Staff-level agent design models the agent as a bounded state machine. The model may propose actions, but the system owns permissions, idempotency, budgets, loop termination, approval, and auditability.
Memorize this answer:
For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.
7️⃣ Request Lifecycle
Trace one request end to end so the design is not abstract. A strong answer shows exactly where context is built, where the model is called, and where outputs are validated.
For AI Agent System Design, I would specifically discuss:
- User gives goal
- Agent creates plan
- Router selects tool
- Policy engine checks permission
- Executor calls tool
- Observation is parsed
- State is updated
- Loop controller decides continue/stop
- Final answer is generated
- Trace and metrics are stored
Staff-level detail:
A Staff-level agent design models the agent as a bounded state machine. The model may propose actions, but the system owns permissions, idempotency, budgets, loop termination, approval, and auditability.
Memorize this answer:
For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.
8️⃣ Data and State Model
LLM systems create new state: conversation turns, prompt versions, model versions, retrieval traces, tool calls, memory records, evaluation labels, token usage, and safety decisions.
For AI Agent System Design, I would specifically discuss:
- request id
- user id / tenant id
- conversation or task id
- prompt version
- model version
- retrieval ids
- tool call ids
- memory ids
- token counts
- safety decision
- quality label
- cost record
Staff-level detail:
A Staff-level agent design models the agent as a bounded state machine. The model may propose actions, but the system owns permissions, idempotency, budgets, loop termination, approval, and auditability.
Memorize this answer:
For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.
9️⃣ Online Path
The online path is latency-sensitive. It should be bounded, permission-safe, observable, and able to fall back when a model, retriever, or tool fails.
For AI Agent System Design, I would specifically discuss:
- explicit ownership
- bounded execution
- safe defaults
- versioned behavior
- measurable quality
- debuggable traces
- fallback path
- operational readiness
Staff-level detail:
A Staff-level agent design models the agent as a bounded state machine. The model may propose actions, but the system owns permissions, idempotency, budgets, loop termination, approval, and auditability.
Memorize this answer:
For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.
10️⃣ Offline Path
The offline path handles indexing, embedding, eval set construction, prompt testing, model rollout, cache warming, analytics, and quality regression checks.
For AI Agent System Design, I would specifically discuss:
- explicit ownership
- bounded execution
- safe defaults
- versioned behavior
- measurable quality
- debuggable traces
- fallback path
- operational readiness
Staff-level detail:
A Staff-level agent design models the agent as a bounded state machine. The model may propose actions, but the system owns permissions, idempotency, budgets, loop termination, approval, and auditability.
Memorize this answer:
For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.
11️⃣ Scaling Strategy
Traditional scaling is still necessary, but LLM systems add expensive model execution, token limits, GPU/provider bottlenecks, vector indexes, and tool fan-out.
For AI Agent System Design, I would specifically discuss:
- stateless service horizontal scaling
- queue-based async processing
- cache hot paths
- model-provider concurrency limits
- GPU or inference capacity
- vector index sharding
- tool fan-out control
- backpressure and admission control
Staff-level detail:
A Staff-level agent design models the agent as a bounded state machine. The model may propose actions, but the system owns permissions, idempotency, budgets, loop termination, approval, and auditability.
Memorize this answer:
For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.
12️⃣ Latency Budget
Break down latency by gateway, orchestration, retrieval, reranking, model inference, tool calls, validation, and streaming response.
For AI Agent System Design, I would specifically discuss:
- gateway/auth latency
- context build latency
- retrieval latency
- reranking latency
- model time to first token
- model total generation time
- tool call latency
- safety validation latency
- streaming response behavior
Staff-level detail:
A Staff-level agent design models the agent as a bounded state machine. The model may propose actions, but the system owns permissions, idempotency, budgets, loop termination, approval, and auditability.
Memorize this answer:
For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.
13️⃣ Reliability and Failure Handling
Failures include normal distributed systems failures plus model-specific failures: hallucination, low confidence, empty retrieval, unsafe tool call, context overflow, and bad prompt version.
For AI Agent System Design, I would specifically discuss:
- infinite loop
- wrong tool call
- unsafe side effect
- non-idempotent retry
- state drift
- tool hallucination
- excessive cost
- low-quality plan
Staff-level detail:
A Staff-level agent design models the agent as a bounded state machine. The model may propose actions, but the system owns permissions, idempotency, budgets, loop termination, approval, and auditability.
Memorize this answer:
For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.
14️⃣ Security and Privacy
LLM systems must handle prompt injection, data exfiltration, tenant isolation, sensitive context, tool permissions, audit logs, and safe logging.
For AI Agent System Design, I would specifically discuss:
- tenant isolation
- least privilege tools
- prompt injection defense
- sensitive data redaction
- safe logging
- audit trails
- data retention
- approval for side effects
Staff-level detail:
A Staff-level agent design models the agent as a bounded state machine. The model may propose actions, but the system owns permissions, idempotency, budgets, loop termination, approval, and auditability.
Memorize this answer:
For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.
15️⃣ Evaluation and Quality
A Staff-level answer must define quality metrics, offline golden sets, online feedback, regression gates, and rollout/rollback strategy.
For AI Agent System Design, I would specifically discuss:
- task completion rate
- steps per task
- tool success rate
- loop timeout rate
- human escalation rate
- unsafe tool block rate
- cost per completed task
- state drift incidents
- offline golden set
- online feedback
- regression gate
- canary rollout
Staff-level detail:
A Staff-level agent design models the agent as a bounded state machine. The model may propose actions, but the system owns permissions, idempotency, budgets, loop termination, approval, and auditability.
Memorize this answer:
For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.
16️⃣ Cost Control
LLM systems need token accounting, model tiering, cache strategy, batching, context trimming, request budgets, and cost attribution by user/team/feature.
For AI Agent System Design, I would specifically discuss:
- token accounting
- model tiering
- semantic cache
- prompt compression
- output token limits
- batching
- provider routing
- budget alerts
Staff-level detail:
A Staff-level agent design models the agent as a bounded state machine. The model may propose actions, but the system owns permissions, idempotency, budgets, loop termination, approval, and auditability.
Memorize this answer:
For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.
17️⃣ Observability
Trace every model call with prompt version, model version, retrieval ids, tool calls, token counts, latency, cost, safety decisions, and final quality signals.
For AI Agent System Design, I would specifically discuss:
- request trace
- prompt/model versions
- retrieval trace
- tool trace
- token counts
- latency breakdown
- cost breakdown
- safety decisions
- quality feedback
Staff-level detail:
A Staff-level agent design models the agent as a bounded state machine. The model may propose actions, but the system owns permissions, idempotency, budgets, loop termination, approval, and auditability.
Memorize this answer:
For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.
18️⃣ Trade-offs
Compare quality, latency, cost, safety, freshness, determinism, explainability, and operational complexity.
For AI Agent System Design, I would specifically discuss:
- quality vs latency
- accuracy vs cost
- freshness vs stability
- automation vs control
- model flexibility vs deterministic rules
- context richness vs token budget
- privacy vs personalization
- speed vs verification
Staff-level detail:
A Staff-level agent design models the agent as a bounded state machine. The model may propose actions, but the system owns permissions, idempotency, budgets, loop termination, approval, and auditability.
Memorize this answer:
For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.
19️⃣ Staff-Level Framing
The Staff-level answer is not that LLMs are smarter. It is that the system creates deterministic boundaries around probabilistic model behavior.
For AI Agent System Design, I would specifically discuss:
- separate deterministic guarantees from probabilistic output
- define system-owned boundaries
- make quality measurable
- make cost attributable
- make failures recoverable
- make rollouts reversible
- make traces debuggable
- avoid demo-only architecture
Staff-level detail:
A Staff-level agent design models the agent as a bounded state machine. The model may propose actions, but the system owns permissions, idempotency, budgets, loop termination, approval, and auditability.
Memorize this answer:
For AI Agent System Design, I would not replace traditional system design with an LLM call. I would keep the traditional architecture, then add an orchestration layer that controls prompts, context, model routing, tools or memory, validation, evaluation, and cost. The Staff-level goal is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled.
20️⃣ Traditional vs LLM Design Comparison
| Area | Traditional SD | LLM System Design |
|---|---|---|
| Core logic | deterministic business logic | model-assisted probabilistic reasoning |
| Input | structured API payload | messages, prompts, retrieved context, tool outputs |
| State | DB rows, cache entries, queue messages | conversation, memory, prompt/model/retrieval/tool traces |
| Correctness | unit tests and business rules | grounding, evaluation, validation, safety policy |
| Scaling | services, DB, cache, queues | plus inference capacity, token budget, vector indexes, tool fan-out |
| Cost | CPU, storage, network | plus input/output tokens, model tier, reranking, embeddings |
| Debugging | logs, metrics, traces | plus prompt version, model version, context, tool calls, eval labels |
| Failure | timeout, DB error, queue lag | plus hallucination, unsafe output, context overflow, low confidence |
21️⃣ Common Interview Follow-ups
Q: How do you start the answer?
A: Start with the traditional system baseline, then explicitly state what changes because this is an LLM system.
Q: Where does the LLM fit?
A: Behind an orchestration layer, not directly exposed as the system itself.
Q: How do you control quality?
A: Use offline evals, online feedback, regression gates, groundedness checks, and prompt/model version tracking.
Q: How do you control cost?
A: Track tokens, route models by task complexity, cache safe results, compress context, and enforce budgets.
Q: How do you handle hallucination?
A: Ground the answer when possible, validate claims, add citation checks, use refusal policy, and measure hallucination rate.
Q: How do you debug bad output?
A: Inspect the full trace: user input, context, prompt version, model version, retrieval results, tool calls, safety decisions, and final output.
Q: What is the Staff-level insight?
A: A Staff-level agent design models the agent as a bounded state machine. The model may propose actions, but the system owns permissions, idempotency, budgets, loop termination, approval, and auditability.
22️⃣ Answer Bank for Memorization
Memorization Paragraph 1
For AI Agent System Design, I would start with the traditional system design baseline and then upgrade it with LLM-specific components. The traditional baseline gives us API gateway, auth, stateless services, storage, cache, queues, scaling, and reliability. The LLM upgrade adds prompt/context construction, model routing, retrieval or memory, tool execution, safety checks, evaluation, token cost, and model observability.
Memorization Paragraph 2
The key design principle for AI Agent System Design is that the model should not be treated as the whole system. The model is one component behind an orchestration layer. The orchestration layer owns context, permissions, model choice, tool access, validation, cost control, and traces.
Memorization Paragraph 3
Compared with traditional SD, AI Agent System Design introduces probabilistic quality. That means I need both system metrics and quality metrics: latency, error rate, and cost on one side; groundedness, safety, user correction rate, and task success on the other side.
Memorization Paragraph 4
At Staff level, I would separate deterministic guarantees from model behavior. The system must deterministically enforce authorization, budgets, schema validation, logging, and rollback. The model can generate or reason, but the platform must make that behavior bounded and measurable.
Memorization Paragraph 5
I would design AI Agent System Design with both online and offline paths. The online path handles user requests under latency and safety constraints. The offline path builds indexes, eval datasets, prompt tests, analytics, cost dashboards, and regression gates so the system improves without silent quality regression.
23️⃣ Senior / Staff-Level Summary Answer
For AI Agent System Design, I would upgrade a traditional system design answer by adding an LLM orchestration layer. The traditional system still handles API, auth, services, storage, caching, queues, scaling, and reliability. The LLM layer handles prompt/context construction, model routing, retrieval or memory, tool execution, safety, evaluation, token cost, and observability. The Staff-level point is to make probabilistic model behavior safe, measurable, debuggable, and cost-controlled through deterministic system boundaries.
中文部分
🎯 AI Agent System Design
这个 topic 的核心是:如何把传统 System Design 答案升级成 LLM System Design 答案。
传统 SD baseline:
A traditional workflow engine executes predefined steps with state machines, queues, workers, retries, and audit logs.
LLM 系统升级点:
An AI agent system adds model-driven planning, dynamic tool selection, observations, memory, bounded loops, safety gates, and human approval for high-risk actions.
1️⃣ 中文核心框架
讨论 AI Agent System Design 时,我会按这个结构回答:
- 先讲传统系统设计 baseline
- 再讲 LLM 系统新增组件
- 画出完整请求链路
- 解释 context/state/model/tool/memory
- 讲 safety、eval、cost、observability
- 最后给 Staff 级 trade-off
中文可背诵回答:
对于 AI Agent System Design,我会先从传统系统设计开始,而不是直接说调用 LLM。 传统部分包括 gateway、service、DB、cache、queue、scaling 和 reliability。 LLM 升级部分包括 prompt、context、model routing、retrieval、memory、tool calling、safety、eval、token cost 和 observability。 Staff 级重点是用确定性的系统边界包住概率性的模型行为。
2️⃣ 传统 SD 起点
先从传统系统设计出发:client、gateway、stateless service、DB、cache、queue、scaling、reliability 和 observability。
在 AI Agent System Design 中具体要讲:
- API gateway
- auth / rate limit
- stateless services
- DB / cache / queue
- replication / failover
- SLO / alerting
中文背诵段落:
讲 AI Agent System Design 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。
3️⃣ 升级到 LLM 系统后的变化
LLM 系统新增 prompt、token、context、retrieval、tool、memory、safety、eval 和 cost control。
在 AI Agent System Design 中具体要讲:
- prompt builder
- context manager
- model router
- retrieval / memory
- tool calling
- safety validation
- eval pipeline
- token cost control
中文背诵段落:
讲 AI Agent System Design 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。
4️⃣ 请求链路
面试里一定要 trace 一次完整请求,说明 context 在哪里构建、model 在哪里调用、output 在哪里验证。
在 AI Agent System Design 中具体要讲:
- User gives goal
- Agent creates plan
- Router selects tool
- Policy engine checks permission
- Executor calls tool
- Observation is parsed
- State is updated
- Loop controller decides continue/stop
- Final answer is generated
- Trace and metrics are stored
中文背诵段落:
讲 AI Agent System Design 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。
5️⃣ 数据与状态模型
LLM 系统新增 conversation、prompt version、model version、retrieval trace、tool call、memory、token usage、safety decision 等状态。
在 AI Agent System Design 中具体要讲:
- request id
- tenant / user id
- conversation / task id
- prompt version
- model version
- retrieval trace
- tool call trace
- token usage
- cost record
- safety decision
中文背诵段落:
讲 AI Agent System Design 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。
6️⃣ Online Path
online path 关注 latency、permission、bounded execution、fallback 和 streaming。
在 AI Agent System Design 中具体要讲:
- 明确边界
- 可度量质量
- 可追踪链路
- 可回滚版本
- 可控成本
- 可处理失败
- 权限安全
- 生产可运维
中文背诵段落:
讲 AI Agent System Design 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。
7️⃣ Offline Path
offline path 负责 indexing、embedding、eval、prompt test、model rollout、cache warming 和 regression。
在 AI Agent System Design 中具体要讲:
- 明确边界
- 可度量质量
- 可追踪链路
- 可回滚版本
- 可控成本
- 可处理失败
- 权限安全
- 生产可运维
中文背诵段落:
讲 AI Agent System Design 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。
8️⃣ 扩展策略
传统 horizontal scaling 仍然重要,但 LLM 系统还要处理模型推理成本、token limit、vector index、tool fan-out 和 provider bottleneck。
在 AI Agent System Design 中具体要讲:
- 明确边界
- 可度量质量
- 可追踪链路
- 可回滚版本
- 可控成本
- 可处理失败
- 权限安全
- 生产可运维
中文背诵段落:
讲 AI Agent System Design 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。
9️⃣ 延迟预算
拆 gateway、orchestrator、retriever、reranker、model inference、tool call、validation 和 streaming latency。
在 AI Agent System Design 中具体要讲:
- 明确边界
- 可度量质量
- 可追踪链路
- 可回滚版本
- 可控成本
- 可处理失败
- 权限安全
- 生产可运维
中文背诵段落:
讲 AI Agent System Design 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。
10️⃣ 可靠性与失败处理
除了传统 timeout 和 retry,还要处理 hallucination、empty retrieval、context overflow、unsafe tool call、bad prompt version。
在 AI Agent System Design 中具体要讲:
- infinite loop
- wrong tool call
- unsafe side effect
- non-idempotent retry
- state drift
- tool hallucination
- excessive cost
- low-quality plan
中文背诵段落:
讲 AI Agent System Design 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。
11️⃣ 安全与隐私
重点讲 prompt injection、防数据泄漏、tenant isolation、tool permission、audit 和 safe logging。
在 AI Agent System Design 中具体要讲:
- 明确边界
- 可度量质量
- 可追踪链路
- 可回滚版本
- 可控成本
- 可处理失败
- 权限安全
- 生产可运维
中文背诵段落:
讲 AI Agent System Design 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。
12️⃣ 评估体系
Staff 级必须讲 offline golden set、online feedback、quality metrics、regression gate 和 rollback。
在 AI Agent System Design 中具体要讲:
- task completion rate
- steps per task
- tool success rate
- loop timeout rate
- human escalation rate
- unsafe tool block rate
- cost per completed task
- state drift incidents
- golden set
- regression gate
- canary rollout
- online feedback
中文背诵段落:
讲 AI Agent System Design 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。
13️⃣ 成本控制
需要 token accounting、model routing、cache、batching、context trimming、request budget 和 cost attribution。
在 AI Agent System Design 中具体要讲:
- token accounting
- model tiering
- semantic cache
- context compression
- output token limit
- batching
- budget alert
- cost attribution
中文背诵段落:
讲 AI Agent System Design 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。
14️⃣ 可观测性
trace model version、prompt version、retrieval ids、tool calls、tokens、latency、cost、safety 和 quality。
在 AI Agent System Design 中具体要讲:
- 明确边界
- 可度量质量
- 可追踪链路
- 可回滚版本
- 可控成本
- 可处理失败
- 权限安全
- 生产可运维
中文背诵段落:
讲 AI Agent System Design 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。
15️⃣ Trade-off
比较质量、延迟、成本、安全、新鲜度、确定性、可解释性和运维复杂度。
在 AI Agent System Design 中具体要讲:
- 质量 vs 延迟
- 准确率 vs 成本
- 自动化 vs 控制
- context 丰富度 vs token budget
- 个性化 vs 隐私
- 速度 vs 验证
- 灵活性 vs 确定性
中文背诵段落:
讲 AI Agent System Design 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。
16️⃣ Staff 级框架
Staff 级答案的核心是:用确定性的系统边界包住概率性的模型行为。
在 AI Agent System Design 中具体要讲:
- 明确边界
- 可度量质量
- 可追踪链路
- 可回滚版本
- 可控成本
- 可处理失败
- 权限安全
- 生产可运维
中文背诵段落:
讲 AI Agent System Design 时,我会明确区分传统系统部分和 LLM 特有部分。 传统系统负责稳定的服务、存储、缓存、队列和扩展。 LLM orchestration 负责 prompt、context、model、retrieval、tool、memory、safety、eval 和 cost。 这样回答比单纯说“调用大模型”更符合 Staff 级系统设计。
17️⃣ 中文高阶追问
Q: 如何开场?
A: 先说传统 SD baseline,然后说 LLM 系统新增哪些层。
Q: LLM 在架构里是什么位置?
A: LLM 是 orchestration layer 后面的执行组件,不是整个系统。
Q: 如何控制质量?
A: offline eval、online feedback、grounding、safety check、regression gate。
Q: 如何控制成本?
A: token budget、model routing、cache、context compression、batching、cost attribution。
Q: 如何 debug?
A: 看完整 trace:input、context、prompt version、model version、retrieval、tool calls、safety、output。
Q: Staff 级 insight 是什么?
A: A Staff-level agent design models the agent as a bounded state machine. The model may propose actions, but the system owns permissions, idempotency, budgets, loop termination, approval, and auditability.
18️⃣ 中文背诵答案库
背诵段落 1
AI Agent System Design 的回答不能停留在传统 SD,也不能只说调用 LLM。我的回答方式是先建立传统系统设计 baseline,然后增加 LLM orchestration layer。传统部分保证服务稳定,LLM 部分负责智能能力,系统边界负责安全、成本、评估和可观测性。
背诵段落 2
传统系统设计主要关注 API、服务、数据库、缓存、队列和扩展。升级到 AI Agent System Design 后,还要关注 prompt version、model version、context window、retrieval、memory、tool calling、safety policy、eval pipeline 和 token cost。
背诵段落 3
Staff 级回答的重点是区分 deterministic system guarantee 和 probabilistic model behavior。权限、预算、schema、audit、rollback 必须由系统确定性保证;模型只负责生成、理解或推理。
背诵段落 4
我会把 AI Agent System Design 拆成 online path 和 offline path。online path 负责用户请求、latency、safety 和 fallback;offline path 负责 index、eval、prompt test、cost analysis、quality regression 和 rollout。
背诵段落 5
如果 AI Agent System Design 出现质量问题,我会通过 trace debug:用户输入是什么、context 是什么、prompt/model 版本是什么、retrieval 和 tool 结果是什么、safety decision 是什么、最终输出为什么通过。
19️⃣ 中文 Staff 总结
AI Agent System Design 的 Staff 级答案,是从传统 SD 过渡到 LLM 系统设计。 传统 SD 提供稳定的分布式系统基础。 LLM 系统新增 prompt、context、model、retrieval、memory、tool、safety、eval、cost 和 observability。 核心不是模型有多聪明,而是系统如何让模型行为可控、可度量、可调试、可回滚。
note:
AI Agent System Design = Traditional SD baseline + LLM orchestration layer + safety/eval/cost/observability.
User gives goal
↓
Agent creates plan
↓
Router selects tool
↓
Policy engine checks permission
↓
Executor calls tool
↓
Observation is parsed
↓
State is updated
↓
Loop controller decides continue/stop
↓
Final answer is generated
↓
Trace and metrics are stored
Remember:
- Start from traditional SD.
- Add LLM-specific layers.
- Trace one request end to end.
- Separate deterministic guarantees from probabilistic model behavior.
- End with Staff-level trade-offs.
Implement