aaa-at AI Agents & Automation ·

🎯 Tool Calling Architecture in LLM Systems

1️⃣ Core Framework

When discussing Tool Calling Architecture in LLM Systems, I frame it as:

Why LLMs need tools
Tool definition and schema
Tool selection
Tool execution layer
Permission and safety checks
Tool result handling
Validation and retries
Trade-offs: flexibility vs control

2️⃣ Why Do LLMs Need Tool Calling?

LLMs are powerful at language and reasoning, but they cannot reliably do everything internally.

They cannot directly:

Query production databases
Access real-time data
Send emails
Call internal APIs
Execute code safely
Update business systems
Verify current state

Without Tools

User asks question
→ LLM guesses from model knowledge
→ Risk of hallucination

With Tools

User asks question
→ LLM decides tool is needed
→ System calls tool
→ Tool returns result
→ LLM answers using real data

👉 Interview Answer

Tool calling allows an LLM system to interact with external systems.

The model can decide that it needs a tool, generate structured arguments, and use the returned result to produce a grounded answer.

This turns the LLM from a text generator into an action-capable system.

3️⃣ What Is a Tool?

Tool Definition

A tool is an external capability exposed to the LLM system.

Examples:

Search documents
Query database
Call payment API
Read calendar
Send email
Execute code
Retrieve logs
Create ticket

Tool Schema

A tool usually has:

Name
Description
Input schema
Output schema
Permission scope
Timeout policy
Retry policy

Example Tool Schema

{
  "name": "search_incidents",
  "description": "Search historical incidents for a service",
  "input_schema": {
    "service": "string",
    "time_range": "string",
    "severity": "string"
  },
  "output_schema": {
    "incidents": "array",
    "count": "number"
  }
}

👉 Interview Answer

A tool is a controlled interface between the LLM and an external system.

In production, tools should be defined with clear schemas, descriptions, permissions, timeouts, retries, and output contracts.

4️⃣ High-Level Tool Calling Architecture

Architecture

User Request
→ Prompt Builder
→ LLM
→ Tool Call Request
→ Tool Router
→ Permission Check
→ Tool Executor
→ External System
→ Tool Result
→ Result Validator
→ LLM
→ Final Response

Key Components

LLM

Decides whether a tool is needed.

Tool Router

Maps tool name to the correct implementation.

Permission Layer

Checks whether the user or agent can call the tool.

Tool Executor

Actually calls the external API or service.

Result Validator

Checks whether the tool output is valid and safe.

👉 Interview Answer

A production tool calling architecture should not let the LLM directly execute actions.

The LLM only proposes a structured tool call.

The application validates permissions, routes the request, executes the tool, validates the result, and then returns the result back to the model.

5️⃣ Tool Selection

How Tool Selection Works

The LLM receives tool descriptions.

Then it decides:

Is a tool needed?
Which tool should be used?
What arguments should be passed?
Should multiple tools be called?

Example

User asks:
"Why did the payment API latency spike yesterday?"

LLM selects:
- query_metrics
- search_logs
- search_deployments
- search_incidents

Tool Selection Risk

The LLM may choose:

Wrong tool
Wrong arguments
Too many tools
Dangerous tools
No tool when tool is needed

👉 Interview Answer

Tool selection is usually handled by the LLM based on tool descriptions and task context.

However, because tool selection can be wrong, production systems should constrain available tools, validate arguments, and sometimes use deterministic routing for high-risk workflows.

6️⃣ Tool Argument Generation

Why Arguments Matter

Even if the correct tool is selected, the arguments may be wrong.

Example Bad Argument

{
  "service": "all_services",
  "time_range": "last_10_years"
}

This may be too expensive or unsafe.

Controls

JSON schema validation
Required fields
Type validation
Range limits
Allowlisted values
Business rule checks

👉 Interview Answer

Tool arguments should never be blindly trusted.

The system should validate the generated arguments against schemas, allowed values, permission rules, and business constraints before executing the tool.

7️⃣ Tool Router

What Is a Tool Router?

The tool router maps a tool call request to the correct backend implementation.

Tool name: search_logs
→ Logs API client

Tool name: query_metrics
→ Metrics service client

Tool name: create_ticket
→ Ticketing system client

Router Responsibilities

Find correct tool implementation
Validate tool availability
Apply environment-specific routing
Add authentication
Enforce timeout policy
Return standardized result

👉 Interview Answer

The tool router is the layer that maps model-generated tool requests to real backend implementations.

It should standardize execution, handle authentication, enforce timeouts, and return structured results to the agent.

8️⃣ Permission and Safety Layer

Why It Is Required

Tools can perform real actions.

Without permission checks, an agent may:

Read unauthorized data
Write to production systems
Send incorrect emails
Delete resources
Expose sensitive data

Permission Checks

Check:

User identity
Agent role
Tool scope
Data sensitivity
Environment
Read vs write access
Approval requirement

Safer Pattern

LLM proposes tool call
→ System validates permission
→ Tool executes only if allowed

👉 Interview Answer

The LLM should not be the source of authority for tool permissions.

Permissions must be enforced by the application layer, based on user identity, agent role, tool scope, data sensitivity, and environment.

9️⃣ Read Tools vs Write Tools

Read Tools

Read tools retrieve information.

Examples:

Search documents
Query metrics
Read database
Fetch logs
List tickets

Lower risk, but still need access control.

Write Tools

Write tools change state.

Examples:

Send email
Create ticket
Update database
Trigger deployment
Issue refund

Higher risk.

Rule

Read tools → permission checks
Write tools → permission checks + validation + approval

👉 Interview Answer

I separate tools into read tools and write tools.

Read tools still require access control, but write tools need stronger safeguards, such as idempotency, validation, audit logs, and sometimes human approval.

🔟 Tool Execution

Execution Flow

Tool call request
→ Validate schema
→ Check permissions
→ Apply timeout
→ Execute API call
→ Normalize response
→ Validate output
→ Return result

Important Controls

Timeout
Retry
Circuit breaker
Rate limit
Idempotency key
Audit log
Error mapping

Tool Failure Example

Metrics API times out
→ Tool executor returns structured error
→ Agent decides retry, fallback, or explain limitation

👉 Interview Answer

Tool execution should be handled by a controlled execution layer.

This layer applies schema validation, permissions, timeouts, retries, rate limits, idempotency, and standardized error handling.

1️⃣1️⃣ Tool Result Handling

Why Result Handling Matters

Tool results can be:

Too large
Partial
Empty
Stale
Invalid
Sensitive
Contradictory

Result Processing

Before sending result to LLM:

Normalize
Summarize
Filter sensitive data
Validate schema
Add metadata
Mark partial failures

Example

Raw log results: 10,000 lines
→ Summarize top error patterns
→ Send compact result to LLM

👉 Interview Answer

Tool results should be processed before being sent back to the LLM.

The system should normalize, summarize, filter, and validate results, especially when outputs are large, sensitive, partial, or inconsistent.

1️⃣2️⃣ Validation and Retry

Validation Types

Validate:

Tool arguments
Tool permissions
Tool output schema
Business rules
Safety policy
Final answer correctness

Retry Strategy

Retry only when safe.

Timeout → retry
Invalid argument → ask model to fix
Permission denied → do not retry
Dangerous action → require approval

Avoid Infinite Loops

Use:

Max retry count
Max tool calls
Max agent steps
Cost budgets

👉 Interview Answer

Tool calling systems need validation and retry logic.

But retries should be controlled.

Some errors are retryable, while permission or safety errors should stop execution immediately.

1️⃣3️⃣ Observability

What to Log

User request ID
Prompt version
Model version
Tool name
Tool arguments
Permission decision
Execution latency
Tool status
Error type
Result size
Final outcome

Why Important?

Tool calling failures are often hard to debug.

You need to know:

Why the tool was selected
What arguments were used
Whether permission passed
What the tool returned
How the LLM used the result

👉 Interview Answer

Tool calling needs detailed observability.

I would log tool selection, arguments, permission decisions, execution latency, errors, result size, and final outcome.

Without this, tool-using agents are very difficult to debug.

1️⃣4️⃣ Common Failure Modes

Failure Modes

Tool calling can fail because:

Wrong tool selected
Bad arguments generated
Permission denied
Tool timeout
Tool returns stale data
Tool output too large
LLM misinterprets result
Write action executed incorrectly

Example

Agent calls production deployment tool
instead of staging deployment tool

Prevention

Tool allowlists
Environment constraints
Approval workflow
Output validation
Strong logging
Human-in-the-loop

👉 Interview Answer

Tool calling failures often happen at the boundary between probabilistic reasoning and deterministic systems.

The system must validate tool choices, arguments, permissions, environment, and outputs before trusting the result.

1️⃣5️⃣ Best Practices

Practical Rules

Define tools with strict schemas
Keep tools narrow and specific
Separate read and write tools
Enforce permissions outside the LLM
Validate all arguments
Normalize all outputs
Use human approval for risky actions
Add audit logs
Set step, retry, and cost limits

Design Principle

The LLM suggests actions.
The system controls execution.

👉 Interview Answer

The best tool calling systems treat the LLM as a planner, not as the authority.

The LLM can suggest tool calls, but the system must enforce schemas, permissions, safety, validation, and execution control.

🧠 Staff-Level Answer Final

👉 Interview Answer Full Version

Tool calling is the architecture that allows LLM systems to interact with external systems.

Without tools, an LLM can only generate text from its model knowledge, which creates hallucination risk and prevents it from using real-time or private data.

With tool calling, the model can decide that it needs an external capability, generate a structured tool request, and use the returned result to produce a grounded answer or continue an agent workflow.

In production, I would not let the LLM directly execute tools.

The LLM should only propose a structured tool call.

The application layer should validate the tool name, arguments, permissions, safety policy, environment, and business rules before execution.

A typical architecture includes a prompt builder, LLM, tool router, permission layer, tool executor, external systems, result validator, and final response generator.

I would also separate read tools from write tools.

Read tools retrieve information, such as documents, metrics, logs, or database records.

Write tools modify state, such as sending emails, updating databases, creating tickets, triggering deployments, or issuing refunds.

Write tools require stronger safeguards: idempotency, audit logs, approval workflows, and deterministic validation.

Tool results should also be processed before going back to the model.

The system should normalize, summarize, filter sensitive data, mark partial failures, and validate output schemas.

The main risks are wrong tool selection, bad arguments, permission bypass, stale data, tool timeout, large outputs, and the LLM misinterpreting tool results.

So production tool calling needs strong observability, including prompt version, model version, tool name, arguments, permission decision, latency, error type, result size, and final outcome.

The core principle is: the LLM suggests actions, but the system controls execution.

⭐ Final Insight

Tool Calling 的核心不是“让 LLM 随便调用 API”。

真正的核心是：

LLM proposes.

System validates.

Backend executes.

Guardrails control.

Production 中最重要的原则是：

Model 不应该拥有执行权。

Model 只负责生成 structured intent。

真正的权限、验证、安全、执行和审计，必须由 application layer 和 backend system 控制。

中文部分

🎯 Tool Calling Architecture in LLM Systems

1️⃣ 核心框架

讨论 LLM Systems 中的 Tool Calling Architecture 时，我通常从这些方面分析：

为什么 LLM 需要 tools
Tool definition and schema
Tool selection
Tool execution layer
Permission and safety checks
Tool result handling
Validation and retries
核心权衡：flexibility vs control

2️⃣ 为什么 LLM 需要 Tool Calling？

LLM 擅长语言和 reasoning，但它不能可靠地在内部完成所有事情。

它不能直接：

查询 production databases
访问 real-time data
发送 emails
调用 internal APIs
安全执行 code
更新 business systems
验证当前系统状态

Without Tools

User asks question
→ LLM guesses from model knowledge
→ Risk of hallucination

With Tools

User asks question
→ LLM decides tool is needed
→ System calls tool
→ Tool returns result
→ LLM answers using real data

👉 面试回答

Tool calling 让 LLM system 可以和 external systems 交互。

Model 可以判断自己需要 tool，生成 structured arguments，然后使用返回结果生成 grounded answer。

这让 LLM 从 text generator 变成 action-capable system。

3️⃣ 什么是 Tool？

Tool Definition

Tool 是暴露给 LLM system 的外部能力。

Examples:

Search documents
Query database
Call payment API
Read calendar
Send email
Execute code
Retrieve logs
Create ticket

Tool Schema

一个 tool 通常包含：

Name
Description
Input schema
Output schema
Permission scope
Timeout policy
Retry policy

Example Tool Schema

{
  "name": "search_incidents",
  "description": "Search historical incidents for a service",
  "input_schema": {
    "service": "string",
    "time_range": "string",
    "severity": "string"
  },
  "output_schema": {
    "incidents": "array",
    "count": "number"
  }
}

👉 面试回答

Tool 是 LLM 和 external system 之间的 controlled interface。

在 production 中， tools 应该有清晰的 schema、 description、permissions、timeouts、 retries 和 output contracts。

4️⃣ High-Level Tool Calling Architecture

Architecture

User Request
→ Prompt Builder
→ LLM
→ Tool Call Request
→ Tool Router
→ Permission Check
→ Tool Executor
→ External System
→ Tool Result
→ Result Validator
→ LLM
→ Final Response

Key Components

LLM

判断是否需要 tool。

Tool Router

把 tool name 映射到正确实现。

Permission Layer

检查 user 或 agent 是否可以调用这个 tool。

Tool Executor

真正调用 external API 或 service。

Result Validator

检查 tool output 是否 valid and safe。

👉 面试回答

Production tool calling architecture 不应该让 LLM 直接执行动作。

LLM 只提出 structured tool call。

Application 负责 permission validation、 routing、tool execution、result validation，然后把结果返回给 model。

5️⃣ Tool Selection

Tool Selection 如何工作？

LLM 接收 tool descriptions。

然后决定：

是否需要 tool？
应该使用哪个 tool？
应该传什么 arguments？
是否需要多个 tools？

Example

User asks:
"Why did the payment API latency spike yesterday?"

LLM selects:
- query_metrics
- search_logs
- search_deployments
- search_incidents

Tool Selection Risk

LLM 可能会选择：

Wrong tool
Wrong arguments
Too many tools
Dangerous tools
需要 tool 但没有调用

👉 面试回答

Tool selection 通常由 LLM 根据 tool descriptions 和 task context 来完成。

但因为 tool selection 可能出错， production systems 应该限制可用 tools、验证 arguments，并在高风险 workflow 中使用 deterministic routing。

6️⃣ Tool Argument Generation

为什么 Arguments 很重要？

即使选对了 tool， arguments 也可能是错的。

Example Bad Argument

{
  "service": "all_services",
  "time_range": "last_10_years"
}

这可能太贵，也可能不安全。

Controls

JSON schema validation
Required fields
Type validation
Range limits
Allowlisted values
Business rule checks

👉 面试回答

Tool arguments 不能被盲目信任。

系统应该在执行 tool 前，用 schemas、allowed values、 permission rules 和 business constraints 验证 generated arguments。

7️⃣ Tool Router

什么是 Tool Router？

Tool router 把 tool call request 映射到正确 backend implementation。

Tool name: search_logs
→ Logs API client

Tool name: query_metrics
→ Metrics service client

Tool name: create_ticket
→ Ticketing system client

Router Responsibilities

找到正确 tool implementation
验证 tool availability
应用 environment-specific routing
添加 authentication
执行 timeout policy
返回 standardized result

👉 面试回答

Tool router 是把 model-generated tool request 映射到真实 backend implementation 的层。

它应该标准化 execution，处理 authentication，执行 timeouts，并向 agent 返回 structured results。

8️⃣ Permission and Safety Layer

为什么必须有 Permission Layer？

Tools 可能执行真实动作。

如果没有 permission checks， agent 可能：

读取 unauthorized data
写入 production systems
发送错误 emails
删除 resources
暴露 sensitive data

Permission Checks

检查：

User identity
Agent role
Tool scope
Data sensitivity
Environment
Read vs write access
Approval requirement

Safer Pattern

LLM proposes tool call
→ System validates permission
→ Tool executes only if allowed

👉 面试回答

LLM 不应该是 tool permission 的权威来源。

Permissions 必须由 application layer 执行，基于 user identity、agent role、 tool scope、data sensitivity 和 environment。

9️⃣ Read Tools vs Write Tools

Read Tools

Read tools 负责读取信息。

Examples:

Search documents
Query metrics
Read database
Fetch logs
List tickets

风险较低，但仍然需要 access control。

Write Tools

Write tools 会改变系统状态。

Examples:

Send email
Create ticket
Update database
Trigger deployment
Issue refund

风险更高。

Rule

Read tools → permission checks
Write tools → permission checks + validation + approval

👉 面试回答

我会把 tools 分成 read tools 和 write tools。

Read tools 也需要 access control，但 write tools 需要更强 safeguards，比如 idempotency、validation、 audit logs，有时还需要 human approval。

🔟 Tool Execution

Execution Flow

Tool call request
→ Validate schema
→ Check permissions
→ Apply timeout
→ Execute API call
→ Normalize response
→ Validate output
→ Return result

Important Controls

Timeout
Retry
Circuit breaker
Rate limit
Idempotency key
Audit log
Error mapping

Tool Failure Example

Metrics API times out
→ Tool executor returns structured error
→ Agent decides retry, fallback, or explain limitation

👉 面试回答

Tool execution 应该由 controlled execution layer 处理。

这一层负责 schema validation、 permissions、timeouts、retries、 rate limits、idempotency 和 standardized error handling。

1️⃣1️⃣ Tool Result Handling

为什么 Result Handling 很重要？

Tool results 可能是：

Too large
Partial
Empty
Stale
Invalid
Sensitive
Contradictory

Result Processing

返回给 LLM 前，应该先：

Normalize
Summarize
Filter sensitive data
Validate schema
Add metadata
Mark partial failures

Example

Raw log results: 10,000 lines
→ Summarize top error patterns
→ Send compact result to LLM

👉 面试回答

Tool results 在返回给 LLM 前应该被处理。

系统应该 normalize、summarize、 filter 和 validate results，特别是当 outputs 很大、敏感、部分失败或不一致时。

1️⃣2️⃣ Validation and Retry

Validation Types

需要验证：

Tool arguments
Tool permissions
Tool output schema
Business rules
Safety policy
Final answer correctness

Retry Strategy

只有安全时才 retry。

Timeout → retry
Invalid argument → ask model to fix
Permission denied → do not retry
Dangerous action → require approval

Avoid Infinite Loops

Use:

Max retry count
Max tool calls
Max agent steps
Cost budgets

👉 面试回答

Tool calling systems 需要 validation 和 retry logic。

但 retry 必须受控。

有些 errors 是 retryable，而 permission 或 safety errors 应该立即停止执行。

1️⃣3️⃣ Observability

What to Log

User request ID
Prompt version
Model version
Tool name
Tool arguments
Permission decision
Execution latency
Tool status
Error type
Result size
Final outcome

为什么重要？

Tool calling failures 通常很难 debug。

你需要知道：

为什么选择这个 tool
使用了什么 arguments
permission 是否通过
tool 返回了什么
LLM 如何使用结果

👉 面试回答

Tool calling 需要详细 observability。

我会记录 tool selection、arguments、 permission decisions、execution latency、 errors、result size 和 final outcome。

否则 tool-using agents 会非常难 debug。

1️⃣4️⃣ Common Failure Modes

Failure Modes

Tool calling 可能因为这些原因失败：

Wrong tool selected
Bad arguments generated
Permission denied
Tool timeout
Tool returns stale data
Tool output too large
LLM misinterprets result
Write action executed incorrectly

Example

Agent calls production deployment tool
instead of staging deployment tool

Prevention

Tool allowlists
Environment constraints
Approval workflow
Output validation
Strong logging
Human-in-the-loop

👉 面试回答

Tool calling failures 通常发生在 probabilistic reasoning 和 deterministic systems 的边界。

系统必须在信任结果之前验证 tool choice、 arguments、permissions、environment 和 outputs。

1️⃣5️⃣ Best Practices

Practical Rules

Define tools with strict schemas
Keep tools narrow and specific
Separate read and write tools
Enforce permissions outside the LLM
Validate all arguments
Normalize all outputs
Use human approval for risky actions
Add audit logs
Set step, retry, and cost limits

Design Principle

The LLM suggests actions.
The system controls execution.

👉 面试回答

最好的 tool calling systems 把 LLM 当作 planner，而不是 authority。

LLM 可以 suggest tool calls，但系统必须执行 schemas、permissions、 safety、validation 和 execution control。

🧠 Staff-Level Answer Final

👉 面试回答完整版本

Tool calling 是让 LLM systems 能够和 external systems 交互的架构。

没有 tools， LLM 只能基于 model knowledge 生成文本，这会带来 hallucination risk，也无法使用 real-time data 或 private data。

有了 tool calling， model 可以判断自己需要一个 external capability，生成 structured tool request，并使用返回结果生成 grounded answer 或继续 agent workflow。

在 production 中，我不会让 LLM 直接执行 tools。

LLM 只应该提出 structured tool call。

Application layer 应该在执行前验证 tool name、 arguments、permissions、safety policy、 environment 和 business rules。

典型架构包括 prompt builder、LLM、 tool router、permission layer、 tool executor、external systems、 result validator 和 final response generator。

我也会区分 read tools 和 write tools。

Read tools 负责检索信息，例如 documents、metrics、logs 或 database records。

Write tools 会修改状态，比如 sending emails、updating databases、 creating tickets、triggering deployments 或 issuing refunds。

Write tools 需要更强的 safeguards： idempotency、audit logs、approval workflows 和 deterministic validation。

Tool results 在返回给 model 前也需要处理。

系统应该 normalize、summarize、 filter sensitive data、mark partial failures，并 validate output schemas。

主要风险包括 wrong tool selection、 bad arguments、permission bypass、 stale data、tool timeout、large outputs，以及 LLM misinterpreting tool results。

所以 production tool calling 需要强 observability，包括 prompt version、model version、 tool name、arguments、permission decision、 latency、error type、result size 和 final outcome。

核心原则是： LLM suggests actions， but the system controls execution。

⭐ Final Insight

Tool Calling 的核心不是“让 LLM 随便调用 API”。

真正的核心是：

LLM proposes.

System validates.

Backend executes.

Guardrails control.

Production 中最重要的原则是：

Model 不应该拥有执行权。

Model 只负责生成 structured intent。

真正的权限、验证、安全、执行和审计，必须由 application layer 和 backend system 控制。

📌 Staff Memorization Pack

30-Second Answer

Tool calling lets an LLM interact with real systems, but production tool calling must be schema-bound, permissioned, validated, observable, and safe to retry.

In production, I would design it with explicit boundaries around planning, execution, validation, permissions, state, observability, and fallback behavior.

2-Minute Staff Answer

For Tool Calling Architecture in LLM Systems, I would start by separating the model’s reasoning role from the system’s execution guarantees.

The LLM can interpret ambiguous intent, produce plans, choose tools, summarize context, and adapt to observations. But the surrounding platform must enforce deterministic controls: schemas, permissions, timeouts, retries, idempotency, audit logging, and policy checks.

My design would include a clear orchestration layer, bounded tool access, managed state, validation after important steps, and human approval for high-risk actions. I would also add tracing for every model call, tool call, decision point, and failure so the system can be debugged and improved.

The staff-level trade-off is autonomy versus control. More autonomy improves flexibility, but it increases cost, latency, unpredictability, and safety risk. A production design should give the agent enough freedom to solve ambiguous tasks while keeping irreversible or correctness-critical actions inside deterministic backend systems.

Architecture Points to Memorize

Tool registry defines names, schemas, descriptions, and ownership
Planner decides when a tool is needed
Tool router validates the selected tool and arguments
Policy engine checks user permission and action risk
Executor calls the external API or internal service
Result normalizer converts output into model-readable context
Validator checks result correctness and safety
Audit log records the call, arguments, result summary, and user intent

Failure Modes to Call Out

wrong tool choice
invalid arguments
unsafe side effects
non-idempotent retries
tool output prompt injection
permission escalation
schema drift
hidden latency and cost

Guardrails and Controls

A strong production answer should mention:

tool allowlists and per-tool permissions
input and output schema validation
max step limits and cost budgets
timeout and retry policy
idempotency keys for side-effecting actions
human approval for high-risk operations
prompt, model, and tool version tracking
agent trace logging
evaluation datasets and regression tests
fallback to deterministic backend or manual review

Common Follow-up Questions

How do you make it reliable?

I would constrain the action space, validate every tool call, make side effects idempotent, add step limits, log full traces, and convert production failures into eval cases. Reliability comes from the system around the model, not from trusting the model blindly.

How do you control cost and latency?

I would use smaller models for simple steps, cache stable context, limit retrieval size, set max iterations, parallelize safe independent work, and stop early when confidence is high enough. I would track cost per task, tokens per step, tool latency, and timeout rate.

How do you handle unsafe actions?

I would classify actions by risk. Read-only actions can be more automated, but writes, money movement, permission changes, deletion, external communication, and compliance-sensitive actions should require deterministic validation or human approval.

How do you debug failures?

I would inspect the agent trace: user goal, prompt version, retrieved context, plan, tool calls, observations, validation results, and final output. Without step-level traces, agent failures are almost impossible to debug at production quality.

中文背诵版

Tool Calling Architecture in LLM Systems 的 Staff 级回答，核心不是说模型有多聪明，而是说怎么把 agent 做成可控的生产系统。

LLM 负责理解目标、拆解任务、选择工具、总结上下文和根据观察调整计划。但是 deterministic backend 必须负责权限、schema 校验、业务规则、幂等、事务、审计和合规。

我会把系统拆成 orchestrator、planner、tool router、execution layer、memory/state store、validator、guardrails、observability 和 fallback path。每一步都要有 trace，每个 tool call 都要有权限和参数校验，高风险动作要有人审或 deterministic validation。

Staff 级 trade-off 是 autonomy versus control。 Autonomy 越高，系统越灵活，但 latency、cost、debug 难度和 safety risk 也越高。所以生产设计要限制 agent 的 action space，把不可逆和 correctness-critical 的动作留给传统后端执行。

Staff-Level Final Sentence

At staff level, I would separate reasoning from execution. The LLM may propose a tool call, but deterministic infrastructure should validate schema, permission, idempotency, timeout, retry policy, and audit logging.