🎯 Tool Calling Architecture in LLM Systems
1️⃣ Core Framework
When discussing Tool Calling Architecture in LLM Systems, I frame it as:
- Why LLMs need tools
- Tool definition and schema
- Tool selection
- Tool execution layer
- Permission and safety checks
- Tool result handling
- Validation and retries
- Trade-offs: flexibility vs control
2️⃣ Why Do LLMs Need Tool Calling?
LLMs are powerful at language and reasoning, but they cannot reliably do everything internally.
They cannot directly:
- Query production databases
- Access real-time data
- Send emails
- Call internal APIs
- Execute code safely
- Update business systems
- Verify current state
Without Tools
User asks question
→ LLM guesses from model knowledge
→ Risk of hallucination
With Tools
User asks question
→ LLM decides tool is needed
→ System calls tool
→ Tool returns result
→ LLM answers using real data
👉 Interview Answer
Tool calling allows an LLM system to interact with external systems.
The model can decide that it needs a tool, generate structured arguments, and use the returned result to produce a grounded answer.
This turns the LLM from a text generator into an action-capable system.
3️⃣ What Is a Tool?
Tool Definition
A tool is an external capability exposed to the LLM system.
Examples:
- Search documents
- Query database
- Call payment API
- Read calendar
- Send email
- Execute code
- Retrieve logs
- Create ticket
Tool Schema
A tool usually has:
- Name
- Description
- Input schema
- Output schema
- Permission scope
- Timeout policy
- Retry policy
Example Tool Schema
{
"name": "search_incidents",
"description": "Search historical incidents for a service",
"input_schema": {
"service": "string",
"time_range": "string",
"severity": "string"
},
"output_schema": {
"incidents": "array",
"count": "number"
}
}
👉 Interview Answer
A tool is a controlled interface between the LLM and an external system.
In production, tools should be defined with clear schemas, descriptions, permissions, timeouts, retries, and output contracts.
4️⃣ High-Level Tool Calling Architecture
Architecture
User Request
→ Prompt Builder
→ LLM
→ Tool Call Request
→ Tool Router
→ Permission Check
→ Tool Executor
→ External System
→ Tool Result
→ Result Validator
→ LLM
→ Final Response
Key Components
LLM
Decides whether a tool is needed.
Tool Router
Maps tool name to the correct implementation.
Permission Layer
Checks whether the user or agent can call the tool.
Tool Executor
Actually calls the external API or service.
Result Validator
Checks whether the tool output is valid and safe.
👉 Interview Answer
A production tool calling architecture should not let the LLM directly execute actions.
The LLM only proposes a structured tool call.
The application validates permissions, routes the request, executes the tool, validates the result, and then returns the result back to the model.
5️⃣ Tool Selection
How Tool Selection Works
The LLM receives tool descriptions.
Then it decides:
- Is a tool needed?
- Which tool should be used?
- What arguments should be passed?
- Should multiple tools be called?
Example
User asks:
"Why did the payment API latency spike yesterday?"
LLM selects:
- query_metrics
- search_logs
- search_deployments
- search_incidents
Tool Selection Risk
The LLM may choose:
- Wrong tool
- Wrong arguments
- Too many tools
- Dangerous tools
- No tool when tool is needed
👉 Interview Answer
Tool selection is usually handled by the LLM based on tool descriptions and task context.
However, because tool selection can be wrong, production systems should constrain available tools, validate arguments, and sometimes use deterministic routing for high-risk workflows.
6️⃣ Tool Argument Generation
Why Arguments Matter
Even if the correct tool is selected, the arguments may be wrong.
Example Bad Argument
{
"service": "all_services",
"time_range": "last_10_years"
}
This may be too expensive or unsafe.
Controls
- JSON schema validation
- Required fields
- Type validation
- Range limits
- Allowlisted values
- Business rule checks
👉 Interview Answer
Tool arguments should never be blindly trusted.
The system should validate the generated arguments against schemas, allowed values, permission rules, and business constraints before executing the tool.
7️⃣ Tool Router
What Is a Tool Router?
The tool router maps a tool call request to the correct backend implementation.
Tool name: search_logs
→ Logs API client
Tool name: query_metrics
→ Metrics service client
Tool name: create_ticket
→ Ticketing system client
Router Responsibilities
- Find correct tool implementation
- Validate tool availability
- Apply environment-specific routing
- Add authentication
- Enforce timeout policy
- Return standardized result
👉 Interview Answer
The tool router is the layer that maps model-generated tool requests to real backend implementations.
It should standardize execution, handle authentication, enforce timeouts, and return structured results to the agent.
8️⃣ Permission and Safety Layer
Why It Is Required
Tools can perform real actions.
Without permission checks, an agent may:
- Read unauthorized data
- Write to production systems
- Send incorrect emails
- Delete resources
- Expose sensitive data
Permission Checks
Check:
- User identity
- Agent role
- Tool scope
- Data sensitivity
- Environment
- Read vs write access
- Approval requirement
Safer Pattern
LLM proposes tool call
→ System validates permission
→ Tool executes only if allowed
👉 Interview Answer
The LLM should not be the source of authority for tool permissions.
Permissions must be enforced by the application layer, based on user identity, agent role, tool scope, data sensitivity, and environment.
9️⃣ Read Tools vs Write Tools
Read Tools
Read tools retrieve information.
Examples:
- Search documents
- Query metrics
- Read database
- Fetch logs
- List tickets
Lower risk, but still need access control.
Write Tools
Write tools change state.
Examples:
- Send email
- Create ticket
- Update database
- Trigger deployment
- Issue refund
Higher risk.
Rule
Read tools → permission checks
Write tools → permission checks + validation + approval
👉 Interview Answer
I separate tools into read tools and write tools.
Read tools still require access control, but write tools need stronger safeguards, such as idempotency, validation, audit logs, and sometimes human approval.
🔟 Tool Execution
Execution Flow
Tool call request
→ Validate schema
→ Check permissions
→ Apply timeout
→ Execute API call
→ Normalize response
→ Validate output
→ Return result
Important Controls
- Timeout
- Retry
- Circuit breaker
- Rate limit
- Idempotency key
- Audit log
- Error mapping
Tool Failure Example
Metrics API times out
→ Tool executor returns structured error
→ Agent decides retry, fallback, or explain limitation
👉 Interview Answer
Tool execution should be handled by a controlled execution layer.
This layer applies schema validation, permissions, timeouts, retries, rate limits, idempotency, and standardized error handling.
1️⃣1️⃣ Tool Result Handling
Why Result Handling Matters
Tool results can be:
- Too large
- Partial
- Empty
- Stale
- Invalid
- Sensitive
- Contradictory
Result Processing
Before sending result to LLM:
- Normalize
- Summarize
- Filter sensitive data
- Validate schema
- Add metadata
- Mark partial failures
Example
Raw log results: 10,000 lines
→ Summarize top error patterns
→ Send compact result to LLM
👉 Interview Answer
Tool results should be processed before being sent back to the LLM.
The system should normalize, summarize, filter, and validate results, especially when outputs are large, sensitive, partial, or inconsistent.
1️⃣2️⃣ Validation and Retry
Validation Types
Validate:
- Tool arguments
- Tool permissions
- Tool output schema
- Business rules
- Safety policy
- Final answer correctness
Retry Strategy
Retry only when safe.
Timeout → retry
Invalid argument → ask model to fix
Permission denied → do not retry
Dangerous action → require approval
Avoid Infinite Loops
Use:
- Max retry count
- Max tool calls
- Max agent steps
- Cost budgets
👉 Interview Answer
Tool calling systems need validation and retry logic.
But retries should be controlled.
Some errors are retryable, while permission or safety errors should stop execution immediately.
1️⃣3️⃣ Observability
What to Log
- User request ID
- Prompt version
- Model version
- Tool name
- Tool arguments
- Permission decision
- Execution latency
- Tool status
- Error type
- Result size
- Final outcome
Why Important?
Tool calling failures are often hard to debug.
You need to know:
- Why the tool was selected
- What arguments were used
- Whether permission passed
- What the tool returned
- How the LLM used the result
👉 Interview Answer
Tool calling needs detailed observability.
I would log tool selection, arguments, permission decisions, execution latency, errors, result size, and final outcome.
Without this, tool-using agents are very difficult to debug.
1️⃣4️⃣ Common Failure Modes
Failure Modes
Tool calling can fail because:
- Wrong tool selected
- Bad arguments generated
- Permission denied
- Tool timeout
- Tool returns stale data
- Tool output too large
- LLM misinterprets result
- Write action executed incorrectly
Example
Agent calls production deployment tool
instead of staging deployment tool
Prevention
- Tool allowlists
- Environment constraints
- Approval workflow
- Output validation
- Strong logging
- Human-in-the-loop
👉 Interview Answer
Tool calling failures often happen at the boundary between probabilistic reasoning and deterministic systems.
The system must validate tool choices, arguments, permissions, environment, and outputs before trusting the result.
1️⃣5️⃣ Best Practices
Practical Rules
- Define tools with strict schemas
- Keep tools narrow and specific
- Separate read and write tools
- Enforce permissions outside the LLM
- Validate all arguments
- Normalize all outputs
- Use human approval for risky actions
- Add audit logs
- Set step, retry, and cost limits
Design Principle
The LLM suggests actions.
The system controls execution.
👉 Interview Answer
The best tool calling systems treat the LLM as a planner, not as the authority.
The LLM can suggest tool calls, but the system must enforce schemas, permissions, safety, validation, and execution control.
🧠 Staff-Level Answer Final
👉 Interview Answer Full Version
Tool calling is the architecture that allows LLM systems to interact with external systems.
Without tools, an LLM can only generate text from its model knowledge, which creates hallucination risk and prevents it from using real-time or private data.
With tool calling, the model can decide that it needs an external capability, generate a structured tool request, and use the returned result to produce a grounded answer or continue an agent workflow.
In production, I would not let the LLM directly execute tools.
The LLM should only propose a structured tool call.
The application layer should validate the tool name, arguments, permissions, safety policy, environment, and business rules before execution.
A typical architecture includes a prompt builder, LLM, tool router, permission layer, tool executor, external systems, result validator, and final response generator.
I would also separate read tools from write tools.
Read tools retrieve information, such as documents, metrics, logs, or database records.
Write tools modify state, such as sending emails, updating databases, creating tickets, triggering deployments, or issuing refunds.
Write tools require stronger safeguards: idempotency, audit logs, approval workflows, and deterministic validation.
Tool results should also be processed before going back to the model.
The system should normalize, summarize, filter sensitive data, mark partial failures, and validate output schemas.
The main risks are wrong tool selection, bad arguments, permission bypass, stale data, tool timeout, large outputs, and the LLM misinterpreting tool results.
So production tool calling needs strong observability, including prompt version, model version, tool name, arguments, permission decision, latency, error type, result size, and final outcome.
The core principle is: the LLM suggests actions, but the system controls execution.
⭐ Final Insight
Tool Calling 的核心不是“让 LLM 随便调用 API”。
真正的核心是:
LLM proposes.
System validates.
Backend executes.
Guardrails control.
Production 中最重要的原则是:
Model 不应该拥有执行权。
Model 只负责生成 structured intent。
真正的权限、验证、安全、执行和审计, 必须由 application layer 和 backend system 控制。
中文部分
🎯 Tool Calling Architecture in LLM Systems
1️⃣ 核心框架
讨论 LLM Systems 中的 Tool Calling Architecture 时,我通常从这些方面分析:
- 为什么 LLM 需要 tools
- Tool definition and schema
- Tool selection
- Tool execution layer
- Permission and safety checks
- Tool result handling
- Validation and retries
- 核心权衡:flexibility vs control
2️⃣ 为什么 LLM 需要 Tool Calling?
LLM 擅长语言和 reasoning, 但它不能可靠地在内部完成所有事情。
它不能直接:
- 查询 production databases
- 访问 real-time data
- 发送 emails
- 调用 internal APIs
- 安全执行 code
- 更新 business systems
- 验证当前系统状态
Without Tools
User asks question
→ LLM guesses from model knowledge
→ Risk of hallucination
With Tools
User asks question
→ LLM decides tool is needed
→ System calls tool
→ Tool returns result
→ LLM answers using real data
👉 面试回答
Tool calling 让 LLM system 可以和 external systems 交互。
Model 可以判断自己需要 tool, 生成 structured arguments, 然后使用返回结果生成 grounded answer。
这让 LLM 从 text generator 变成 action-capable system。
3️⃣ 什么是 Tool?
Tool Definition
Tool 是暴露给 LLM system 的外部能力。
Examples:
- Search documents
- Query database
- Call payment API
- Read calendar
- Send email
- Execute code
- Retrieve logs
- Create ticket
Tool Schema
一个 tool 通常包含:
- Name
- Description
- Input schema
- Output schema
- Permission scope
- Timeout policy
- Retry policy
Example Tool Schema
{
"name": "search_incidents",
"description": "Search historical incidents for a service",
"input_schema": {
"service": "string",
"time_range": "string",
"severity": "string"
},
"output_schema": {
"incidents": "array",
"count": "number"
}
}
👉 面试回答
Tool 是 LLM 和 external system 之间的 controlled interface。
在 production 中, tools 应该有清晰的 schema、 description、permissions、timeouts、 retries 和 output contracts。
4️⃣ High-Level Tool Calling Architecture
Architecture
User Request
→ Prompt Builder
→ LLM
→ Tool Call Request
→ Tool Router
→ Permission Check
→ Tool Executor
→ External System
→ Tool Result
→ Result Validator
→ LLM
→ Final Response
Key Components
LLM
判断是否需要 tool。
Tool Router
把 tool name 映射到正确实现。
Permission Layer
检查 user 或 agent 是否可以调用这个 tool。
Tool Executor
真正调用 external API 或 service。
Result Validator
检查 tool output 是否 valid and safe。
👉 面试回答
Production tool calling architecture 不应该让 LLM 直接执行动作。
LLM 只提出 structured tool call。
Application 负责 permission validation、 routing、tool execution、result validation, 然后把结果返回给 model。
5️⃣ Tool Selection
Tool Selection 如何工作?
LLM 接收 tool descriptions。
然后决定:
- 是否需要 tool?
- 应该使用哪个 tool?
- 应该传什么 arguments?
- 是否需要多个 tools?
Example
User asks:
"Why did the payment API latency spike yesterday?"
LLM selects:
- query_metrics
- search_logs
- search_deployments
- search_incidents
Tool Selection Risk
LLM 可能会选择:
- Wrong tool
- Wrong arguments
- Too many tools
- Dangerous tools
- 需要 tool 但没有调用
👉 面试回答
Tool selection 通常由 LLM 根据 tool descriptions 和 task context 来完成。
但因为 tool selection 可能出错, production systems 应该限制可用 tools、 验证 arguments, 并在高风险 workflow 中使用 deterministic routing。
6️⃣ Tool Argument Generation
为什么 Arguments 很重要?
即使选对了 tool, arguments 也可能是错的。
Example Bad Argument
{
"service": "all_services",
"time_range": "last_10_years"
}
这可能太贵,也可能不安全。
Controls
- JSON schema validation
- Required fields
- Type validation
- Range limits
- Allowlisted values
- Business rule checks
👉 面试回答
Tool arguments 不能被盲目信任。
系统应该在执行 tool 前, 用 schemas、allowed values、 permission rules 和 business constraints 验证 generated arguments。
7️⃣ Tool Router
什么是 Tool Router?
Tool router 把 tool call request 映射到正确 backend implementation。
Tool name: search_logs
→ Logs API client
Tool name: query_metrics
→ Metrics service client
Tool name: create_ticket
→ Ticketing system client
Router Responsibilities
- 找到正确 tool implementation
- 验证 tool availability
- 应用 environment-specific routing
- 添加 authentication
- 执行 timeout policy
- 返回 standardized result
👉 面试回答
Tool router 是把 model-generated tool request 映射到真实 backend implementation 的层。
它应该标准化 execution, 处理 authentication, 执行 timeouts, 并向 agent 返回 structured results。
8️⃣ Permission and Safety Layer
为什么必须有 Permission Layer?
Tools 可能执行真实动作。
如果没有 permission checks, agent 可能:
- 读取 unauthorized data
- 写入 production systems
- 发送错误 emails
- 删除 resources
- 暴露 sensitive data
Permission Checks
检查:
- User identity
- Agent role
- Tool scope
- Data sensitivity
- Environment
- Read vs write access
- Approval requirement
Safer Pattern
LLM proposes tool call
→ System validates permission
→ Tool executes only if allowed
👉 面试回答
LLM 不应该是 tool permission 的权威来源。
Permissions 必须由 application layer 执行, 基于 user identity、agent role、 tool scope、data sensitivity 和 environment。
9️⃣ Read Tools vs Write Tools
Read Tools
Read tools 负责读取信息。
Examples:
- Search documents
- Query metrics
- Read database
- Fetch logs
- List tickets
风险较低, 但仍然需要 access control。
Write Tools
Write tools 会改变系统状态。
Examples:
- Send email
- Create ticket
- Update database
- Trigger deployment
- Issue refund
风险更高。
Rule
Read tools → permission checks
Write tools → permission checks + validation + approval
👉 面试回答
我会把 tools 分成 read tools 和 write tools。
Read tools 也需要 access control, 但 write tools 需要更强 safeguards, 比如 idempotency、validation、 audit logs,有时还需要 human approval。
🔟 Tool Execution
Execution Flow
Tool call request
→ Validate schema
→ Check permissions
→ Apply timeout
→ Execute API call
→ Normalize response
→ Validate output
→ Return result
Important Controls
- Timeout
- Retry
- Circuit breaker
- Rate limit
- Idempotency key
- Audit log
- Error mapping
Tool Failure Example
Metrics API times out
→ Tool executor returns structured error
→ Agent decides retry, fallback, or explain limitation
👉 面试回答
Tool execution 应该由 controlled execution layer 处理。
这一层负责 schema validation、 permissions、timeouts、retries、 rate limits、idempotency 和 standardized error handling。
1️⃣1️⃣ Tool Result Handling
为什么 Result Handling 很重要?
Tool results 可能是:
- Too large
- Partial
- Empty
- Stale
- Invalid
- Sensitive
- Contradictory
Result Processing
返回给 LLM 前,应该先:
- Normalize
- Summarize
- Filter sensitive data
- Validate schema
- Add metadata
- Mark partial failures
Example
Raw log results: 10,000 lines
→ Summarize top error patterns
→ Send compact result to LLM
👉 面试回答
Tool results 在返回给 LLM 前应该被处理。
系统应该 normalize、summarize、 filter 和 validate results, 特别是当 outputs 很大、敏感、 部分失败或不一致时。
1️⃣2️⃣ Validation and Retry
Validation Types
需要验证:
- Tool arguments
- Tool permissions
- Tool output schema
- Business rules
- Safety policy
- Final answer correctness
Retry Strategy
只有安全时才 retry。
Timeout → retry
Invalid argument → ask model to fix
Permission denied → do not retry
Dangerous action → require approval
Avoid Infinite Loops
Use:
- Max retry count
- Max tool calls
- Max agent steps
- Cost budgets
👉 面试回答
Tool calling systems 需要 validation 和 retry logic。
但 retry 必须受控。
有些 errors 是 retryable, 而 permission 或 safety errors 应该立即停止执行。
1️⃣3️⃣ Observability
What to Log
- User request ID
- Prompt version
- Model version
- Tool name
- Tool arguments
- Permission decision
- Execution latency
- Tool status
- Error type
- Result size
- Final outcome
为什么重要?
Tool calling failures 通常很难 debug。
你需要知道:
- 为什么选择这个 tool
- 使用了什么 arguments
- permission 是否通过
- tool 返回了什么
- LLM 如何使用结果
👉 面试回答
Tool calling 需要详细 observability。
我会记录 tool selection、arguments、 permission decisions、execution latency、 errors、result size 和 final outcome。
否则 tool-using agents 会非常难 debug。
1️⃣4️⃣ Common Failure Modes
Failure Modes
Tool calling 可能因为这些原因失败:
- Wrong tool selected
- Bad arguments generated
- Permission denied
- Tool timeout
- Tool returns stale data
- Tool output too large
- LLM misinterprets result
- Write action executed incorrectly
Example
Agent calls production deployment tool
instead of staging deployment tool
Prevention
- Tool allowlists
- Environment constraints
- Approval workflow
- Output validation
- Strong logging
- Human-in-the-loop
👉 面试回答
Tool calling failures 通常发生在 probabilistic reasoning 和 deterministic systems 的边界。
系统必须在信任结果之前验证 tool choice、 arguments、permissions、environment 和 outputs。
1️⃣5️⃣ Best Practices
Practical Rules
- Define tools with strict schemas
- Keep tools narrow and specific
- Separate read and write tools
- Enforce permissions outside the LLM
- Validate all arguments
- Normalize all outputs
- Use human approval for risky actions
- Add audit logs
- Set step, retry, and cost limits
Design Principle
The LLM suggests actions.
The system controls execution.
👉 面试回答
最好的 tool calling systems 把 LLM 当作 planner, 而不是 authority。
LLM 可以 suggest tool calls, 但系统必须执行 schemas、permissions、 safety、validation 和 execution control。
🧠 Staff-Level Answer Final
👉 面试回答完整版本
Tool calling 是让 LLM systems 能够和 external systems 交互的架构。
没有 tools, LLM 只能基于 model knowledge 生成文本, 这会带来 hallucination risk, 也无法使用 real-time data 或 private data。
有了 tool calling, model 可以判断自己需要一个 external capability, 生成 structured tool request, 并使用返回结果生成 grounded answer 或继续 agent workflow。
在 production 中, 我不会让 LLM 直接执行 tools。
LLM 只应该提出 structured tool call。
Application layer 应该在执行前验证 tool name、 arguments、permissions、safety policy、 environment 和 business rules。
典型架构包括 prompt builder、LLM、 tool router、permission layer、 tool executor、external systems、 result validator 和 final response generator。
我也会区分 read tools 和 write tools。
Read tools 负责检索信息, 例如 documents、metrics、logs 或 database records。
Write tools 会修改状态, 比如 sending emails、updating databases、 creating tickets、triggering deployments 或 issuing refunds。
Write tools 需要更强的 safeguards: idempotency、audit logs、approval workflows 和 deterministic validation。
Tool results 在返回给 model 前也需要处理。
系统应该 normalize、summarize、 filter sensitive data、mark partial failures, 并 validate output schemas。
主要风险包括 wrong tool selection、 bad arguments、permission bypass、 stale data、tool timeout、large outputs, 以及 LLM misinterpreting tool results。
所以 production tool calling 需要强 observability, 包括 prompt version、model version、 tool name、arguments、permission decision、 latency、error type、result size 和 final outcome。
核心原则是: LLM suggests actions, but the system controls execution。
⭐ Final Insight
Tool Calling 的核心不是“让 LLM 随便调用 API”。
真正的核心是:
LLM proposes.
System validates.
Backend executes.
Guardrails control.
Production 中最重要的原则是:
Model 不应该拥有执行权。
Model 只负责生成 structured intent。
真正的权限、验证、安全、执行和审计, 必须由 application layer 和 backend system 控制。
📌 Staff Memorization Pack
30-Second Answer
Tool calling lets an LLM interact with real systems, but production tool calling must be schema-bound, permissioned, validated, observable, and safe to retry.
In production, I would design it with explicit boundaries around planning, execution, validation, permissions, state, observability, and fallback behavior.
2-Minute Staff Answer
For Tool Calling Architecture in LLM Systems, I would start by separating the model’s reasoning role from the system’s execution guarantees.
The LLM can interpret ambiguous intent, produce plans, choose tools, summarize context, and adapt to observations. But the surrounding platform must enforce deterministic controls: schemas, permissions, timeouts, retries, idempotency, audit logging, and policy checks.
My design would include a clear orchestration layer, bounded tool access, managed state, validation after important steps, and human approval for high-risk actions. I would also add tracing for every model call, tool call, decision point, and failure so the system can be debugged and improved.
The staff-level trade-off is autonomy versus control. More autonomy improves flexibility, but it increases cost, latency, unpredictability, and safety risk. A production design should give the agent enough freedom to solve ambiguous tasks while keeping irreversible or correctness-critical actions inside deterministic backend systems.
Architecture Points to Memorize
- Tool registry defines names, schemas, descriptions, and ownership
- Planner decides when a tool is needed
- Tool router validates the selected tool and arguments
- Policy engine checks user permission and action risk
- Executor calls the external API or internal service
- Result normalizer converts output into model-readable context
- Validator checks result correctness and safety
- Audit log records the call, arguments, result summary, and user intent
Failure Modes to Call Out
- wrong tool choice
- invalid arguments
- unsafe side effects
- non-idempotent retries
- tool output prompt injection
- permission escalation
- schema drift
- hidden latency and cost
Guardrails and Controls
A strong production answer should mention:
- tool allowlists and per-tool permissions
- input and output schema validation
- max step limits and cost budgets
- timeout and retry policy
- idempotency keys for side-effecting actions
- human approval for high-risk operations
- prompt, model, and tool version tracking
- agent trace logging
- evaluation datasets and regression tests
- fallback to deterministic backend or manual review
Common Follow-up Questions
How do you make it reliable?
I would constrain the action space, validate every tool call, make side effects idempotent, add step limits, log full traces, and convert production failures into eval cases. Reliability comes from the system around the model, not from trusting the model blindly.
How do you control cost and latency?
I would use smaller models for simple steps, cache stable context, limit retrieval size, set max iterations, parallelize safe independent work, and stop early when confidence is high enough. I would track cost per task, tokens per step, tool latency, and timeout rate.
How do you handle unsafe actions?
I would classify actions by risk. Read-only actions can be more automated, but writes, money movement, permission changes, deletion, external communication, and compliance-sensitive actions should require deterministic validation or human approval.
How do you debug failures?
I would inspect the agent trace: user goal, prompt version, retrieved context, plan, tool calls, observations, validation results, and final output. Without step-level traces, agent failures are almost impossible to debug at production quality.
中文背诵版
Tool Calling Architecture in LLM Systems 的 Staff 级回答,核心不是说模型有多聪明,而是说怎么把 agent 做成可控的生产系统。
LLM 负责理解目标、拆解任务、选择工具、总结上下文和根据观察调整计划。 但是 deterministic backend 必须负责权限、schema 校验、业务规则、幂等、事务、审计和合规。
我会把系统拆成 orchestrator、planner、tool router、execution layer、memory/state store、validator、guardrails、observability 和 fallback path。 每一步都要有 trace,每个 tool call 都要有权限和参数校验,高风险动作要有人审或 deterministic validation。
Staff 级 trade-off 是 autonomy versus control。 Autonomy 越高,系统越灵活,但 latency、cost、debug 难度和 safety risk 也越高。 所以生产设计要限制 agent 的 action space,把不可逆和 correctness-critical 的动作留给传统后端执行。
Staff-Level Final Sentence
At staff level, I would separate reasoning from execution. The LLM may propose a tool call, but deterministic infrastructure should validate schema, permission, idempotency, timeout, retry policy, and audit logging.
Implement