·

System Design Deep Dive - 07 Planning vs Execution in AI Agents

Post by ailswan May. 24, 2026

中文 ↓

🎯 Planning vs Execution in AI Agents


1️⃣ Core Framework

When discussing Planning vs Execution in AI Agents, I frame it as:

  1. What planning means
  2. What execution means
  3. Why they should be separated
  4. Planner architecture
  5. Executor architecture
  6. Feedback loops
  7. Validation and guardrails
  8. Trade-offs: flexibility vs reliability

2️⃣ What Is Planning?

Planning is the process where the agent decides what steps are needed to achieve a goal.

The planner answers:


Planning Flow

User Goal
→ Understand intent
→ Break goal into steps
→ Choose strategy
→ Identify tools
→ Create execution plan

👉 Interview Answer

Planning is the reasoning phase of an AI agent.

The agent analyzes the user goal, decomposes it into smaller steps, decides which tools may be needed, and creates a strategy for completing the task.


3️⃣ What Is Execution?


Execution Definition

Execution is the process of carrying out the plan.

The executor performs actions such as:


Execution Flow

Plan
→ Select next action
→ Validate action
→ Call tool
→ Observe result
→ Store output
→ Continue or stop

👉 Interview Answer

Execution is the action phase of an AI agent.

It takes the plan and performs concrete operations, such as calling tools, retrieving data, running code, validating results, and producing the final output.


4️⃣ Why Separate Planning and Execution?


Key Reason

Planning and execution have different responsibilities.

Planning = decide what should happen
Execution = safely make it happen

Why Separation Helps


Bad Design

LLM thinks and executes freely
→ Hard to control
→ Hard to debug
→ Higher risk

Better Design

Planner creates plan
→ System validates plan
→ Executor runs approved steps
→ Results are observed and validated

👉 Interview Answer

Planning and execution should be separated because reasoning and action have different risk profiles.

The planner can propose steps, but the executor should validate permissions, enforce constraints, call tools safely, and handle failures.

This makes the system more reliable and controllable.


5️⃣ Planner Component


Planner Responsibilities

The planner is responsible for:


Example Plan

{
  "goal": "Investigate payment API latency spike",
  "steps": [
    "Query payment API latency metrics",
    "Search recent deployment history",
    "Search error logs",
    "Compare with previous incidents",
    "Summarize likely root cause"
  ],
  "risk_level": "read_only",
  "success_criteria": "Identify likely cause with supporting evidence"
}

👉 Interview Answer

The planner creates a structured strategy for solving the task.

A good planner should define the goal, steps, required tools, risk level, and success criteria before execution begins.


6️⃣ Executor Component


Executor Responsibilities

The executor is responsible for:


Executor Flow

Receive step
→ Validate step
→ Check permission
→ Execute tool
→ Normalize result
→ Store observation
→ Return to planner

Important Point

The executor should not blindly trust the planner.


👉 Interview Answer

The executor is the controlled action layer.

It should validate each step, enforce permissions, execute tools, handle retries and timeouts, normalize results, and stop unsafe or invalid actions.


7️⃣ Plan-Execute-Observe Loop


Core Loop

Goal
→ Plan
→ Execute
→ Observe
→ Re-plan
→ Continue
→ Final Answer

Why Observation Matters

The agent must learn from execution results.

Example:

Plan: Query logs for error code 500
Execution: No 500 errors found
Observation: Try latency metrics instead

Dynamic Re-planning

The agent may update the plan when:


👉 Interview Answer

Real agents usually need a plan-execute-observe loop.

After each execution step, the agent observes the result, updates its understanding, and may revise the plan.

This makes the workflow adaptive but also more expensive and harder to control.


8️⃣ Single-Shot Planning vs Iterative Planning


Single-Shot Planning

The agent creates the full plan upfront.

Goal
→ Full plan
→ Execute all steps
→ Final answer

Pros


Cons


Iterative Planning

The agent plans step by step.

Goal
→ Plan next step
→ Execute
→ Observe
→ Plan next step

Pros


Cons


👉 Interview Answer

Single-shot planning is useful for predictable tasks because it is cheaper and easier to control.

Iterative planning is better for uncertain or exploratory tasks, but it increases latency, cost, and orchestration complexity.


9️⃣ Planning Granularity


Coarse-Grained Plan

Investigate incident
→ Gather data
→ Analyze evidence
→ Summarize result

Good for flexibility.


Fine-Grained Plan

Step 1: Query p95 latency
Step 2: Query p99 latency
Step 3: Search logs for timeout errors
Step 4: Search deployments in last 24h

Good for control.


Trade-off

Plan Type Strength Weakness
Coarse-grained Flexible Less controllable
Fine-grained Easier to validate Less adaptive

👉 Interview Answer

Planning granularity is an important design choice.

Coarse-grained plans give the agent more flexibility, while fine-grained plans are easier to validate and control.

For high-risk workflows, I prefer more explicit and fine-grained planning.


🔟 Validation Between Planning and Execution


Why Validate Plans?

Plans can be wrong or unsafe.

Before execution, the system should check:


Validation Flow

Planner proposes plan
→ Policy engine validates
→ Human approval if needed
→ Executor runs approved steps

👉 Interview Answer

Plans should be validated before execution.

The system should check permissions, tool scope, risk level, cost, and safety policy.

High-risk actions should require approval before the executor runs them.


1️⃣1️⃣ Human-in-the-Loop


When Needed?

Human approval is useful when the agent wants to:


Pattern

Agent creates plan
→ Human reviews plan
→ Human approves
→ Executor runs action

Why Important?

It separates recommendation from action.


👉 Interview Answer

Human-in-the-loop is important for high-risk execution.

The agent can create a plan or recommendation, but actions that change production systems, money, permissions, or customer communication should require human approval.


1️⃣2️⃣ Planning Failure Modes


Common Planning Failures


Example

User asks for account summary
Agent creates a 12-step research workflow

This is over-planning.


Controls


👉 Interview Answer

Planning can fail when the agent misunderstands the goal, skips necessary steps, chooses wrong tools, or creates an overly complex plan.

Production systems need plan validation, step limits, and fallback paths.


1️⃣3️⃣ Execution Failure Modes


Common Execution Failures


Example

Executor retries a write operation without idempotency
→ Duplicate ticket created

Controls


👉 Interview Answer

Execution can fail because tools and external systems are unreliable.

The executor should handle timeouts, retries, permission failures, invalid arguments, idempotency, and structured error handling.


1️⃣4️⃣ Observability


What to Log

For planning:

For execution:


Why Important?

You need to answer:

Was the plan wrong,
or did execution fail?

👉 Interview Answer

Observability should separate planning traces from execution traces.

This helps determine whether a failure came from bad reasoning, bad tool selection, permission issues, or external system failure.


1️⃣5️⃣ Best Practices


Practical Rules


Design Principle

Planner decides what to do.
Executor controls what is allowed to happen.

👉 Interview Answer

The best agent systems separate planning from execution.

The planner reasons about the goal and proposes steps.

The executor safely performs approved actions with permission checks, validation, retries, timeouts, and audit logs.


🧠 Staff-Level Answer Final


👉 Interview Answer Full Version

Planning and execution are two separate responsibilities in AI agent systems.

Planning is the reasoning phase. The agent understands the user goal, decomposes it into steps, chooses a strategy, identifies required tools, and defines success criteria.

Execution is the action phase. The system runs approved steps, calls tools, checks permissions, handles retries, records observations, and stops unsafe actions.

I prefer separating these two concerns because reasoning and action have different risk profiles.

The planner can propose what should happen, but the executor must control what is actually allowed to happen.

A common production pattern is:

user goal → planner creates structured plan → policy engine validates the plan → executor runs approved steps → tool results are observed → planner may revise the plan → final answer is generated.

This plan-execute-observe loop makes agents adaptive.

But it also increases latency, cost, and complexity.

So we need constraints: max step limits, retry limits, risk classification, permission checks, human approval for high-risk actions, idempotency for writes, and detailed observability.

Planning can fail through wrong goals, missing steps, bad tool choices, unsafe plans, or over-planning.

Execution can fail through tool timeouts, invalid arguments, permission errors, stale data, duplicate actions, or idempotency failures.

This is why production agent systems should log planning and execution separately.

We need to know whether a failure came from bad reasoning or bad execution.

The core principle is: the planner decides what to do, but the executor controls what is allowed to happen.


⭐ Final Insight

AI Agent 里的 Planning 和 Execution 不能混在一起看。

Planning 是:

“应该做什么?”

Execution 是:

“能不能安全地做?怎么做?做完结果是什么?”

真正 production-ready 的 agent, 不是让 LLM 自由行动。

而是:

Planner proposes.

Executor validates and controls.

Tools execute safely.

Observations feed back into the next plan.

最重要的一句话:

Planner decides intent.

Executor enforces reality.


中文部分


🎯 Planning vs Execution in AI Agents


1️⃣ 核心框架

讨论 AI Agents 中的 Planning vs Execution 时,我通常从这些方面分析:

  1. 什么是 planning
  2. 什么是 execution
  3. 为什么要分离二者
  4. Planner architecture
  5. Executor architecture
  6. Feedback loops
  7. Validation and guardrails
  8. 核心权衡:flexibility vs reliability

2️⃣ 什么是 Planning?

Planning 是 agent 决定为了完成目标需要哪些步骤的过程。

Planner 回答这些问题:


Planning Flow

User Goal
→ Understand intent
→ Break goal into steps
→ Choose strategy
→ Identify tools
→ Create execution plan

👉 面试回答

Planning 是 AI agent 的 reasoning phase。

Agent 分析 user goal, 把目标拆解成更小步骤, 决定可能需要哪些 tools, 并制定完成任务的 strategy。


3️⃣ 什么是 Execution?


Execution Definition

Execution 是执行 plan 的过程。

Executor 负责执行具体动作,例如:


Execution Flow

Plan
→ Select next action
→ Validate action
→ Call tool
→ Observe result
→ Store output
→ Continue or stop

👉 面试回答

Execution 是 AI agent 的 action phase。

它接收 plan, 然后执行具体操作, 比如调用 tools、检索数据、 运行代码、验证结果, 最后生成 output。


4️⃣ 为什么要分离 Planning 和 Execution?


Key Reason

Planning 和 execution 的职责不同。

Planning = decide what should happen
Execution = safely make it happen

为什么 separation 有帮助?


Bad Design

LLM thinks and executes freely
→ Hard to control
→ Hard to debug
→ Higher risk

Better Design

Planner creates plan
→ System validates plan
→ Executor runs approved steps
→ Results are observed and validated

👉 面试回答

Planning 和 execution 应该分离, 因为 reasoning 和 action 的风险不同。

Planner 可以提出 steps, 但 executor 应该负责 permission validation、 constraints enforcement、 safe tool calling 和 failure handling。

这样系统更可靠,也更可控。


5️⃣ Planner Component


Planner Responsibilities

Planner 负责:


Example Plan

{
  "goal": "Investigate payment API latency spike",
  "steps": [
    "Query payment API latency metrics",
    "Search recent deployment history",
    "Search error logs",
    "Compare with previous incidents",
    "Summarize likely root cause"
  ],
  "risk_level": "read_only",
  "success_criteria": "Identify likely cause with supporting evidence"
}

👉 面试回答

Planner 会创建解决任务的 structured strategy。

好的 planner 应该定义 goal、steps、 required tools、risk level 和 success criteria, 然后再进入 execution。


6️⃣ Executor Component


Executor Responsibilities

Executor 负责:


Executor Flow

Receive step
→ Validate step
→ Check permission
→ Execute tool
→ Normalize result
→ Store observation
→ Return to planner

Important Point

Executor 不应该盲目信任 planner。


👉 面试回答

Executor 是 controlled action layer。

它应该 validate 每个 step, enforce permissions, execute tools, 处理 retries 和 timeouts, normalize results, 并阻止 unsafe 或 invalid actions。


7️⃣ Plan-Execute-Observe Loop


Core Loop

Goal
→ Plan
→ Execute
→ Observe
→ Re-plan
→ Continue
→ Final Answer

为什么 Observation 重要?

Agent 必须根据 execution results 学习和调整。

Example:

Plan: Query logs for error code 500
Execution: No 500 errors found
Observation: Try latency metrics instead

Dynamic Re-planning

Agent 可能在这些情况下更新 plan:


👉 面试回答

真实 agent 通常需要 plan-execute-observe loop。

每次 execution 后, agent 观察 result, 更新理解, 并可能修改 plan。

这让 workflow 更 adaptive, 但也增加 cost、latency 和控制难度。


8️⃣ Single-Shot Planning vs Iterative Planning


Single-Shot Planning

Agent 一次性创建完整 plan。

Goal
→ Full plan
→ Execute all steps
→ Final answer

Pros


Cons


Iterative Planning

Agent 一步一步规划。

Goal
→ Plan next step
→ Execute
→ Observe
→ Plan next step

Pros


Cons


👉 面试回答

Single-shot planning 适合 predictable tasks, 因为它更便宜、更容易控制。

Iterative planning 更适合 uncertain 或 exploratory tasks, 但会增加 latency、cost 和 orchestration complexity。


9️⃣ Planning Granularity


Coarse-Grained Plan

Investigate incident
→ Gather data
→ Analyze evidence
→ Summarize result

灵活性更高。


Fine-Grained Plan

Step 1: Query p95 latency
Step 2: Query p99 latency
Step 3: Search logs for timeout errors
Step 4: Search deployments in last 24h

可控性更高。


Trade-off

Plan Type 优点 缺点
Coarse-grained Flexible Less controllable
Fine-grained Easier to validate Less adaptive

👉 面试回答

Planning granularity 是重要设计选择。

Coarse-grained plans 给 agent 更多 flexibility, Fine-grained plans 更容易 validate 和 control。

对 high-risk workflows, 我更倾向于 explicit 和 fine-grained planning。


🔟 Planning 和 Execution 之间的 Validation


为什么要 Validate Plans?

Plan 可能错误或不安全。

Execution 前,系统应该检查:


Validation Flow

Planner proposes plan
→ Policy engine validates
→ Human approval if needed
→ Executor runs approved steps

👉 面试回答

Plans 在 execution 前应该被 validate。

系统应该检查 permissions、tool scope、 risk level、cost 和 safety policy。

High-risk actions 应该在 executor 执行前 需要 approval。


1️⃣1️⃣ Human-in-the-Loop


什么时候需要?

当 agent 想要执行这些动作时, 通常需要 human approval:


Pattern

Agent creates plan
→ Human reviews plan
→ Human approves
→ Executor runs action

为什么重要?

它把 recommendation 和 action 分离。


👉 面试回答

Human-in-the-loop 对 high-risk execution 很重要。

Agent 可以生成 plan 或 recommendation, 但涉及 production systems、money、 permissions 或 customer communication 的动作, 应该需要 human approval。


1️⃣2️⃣ Planning Failure Modes


Common Planning Failures


Example

User asks for account summary
Agent creates a 12-step research workflow

这是 over-planning。


Controls


👉 面试回答

Planning 可能失败, 因为 agent 误解目标、 漏掉必要步骤、 选择错误 tools, 或创建过于复杂的 plan。

Production systems 需要 plan validation、 step limits 和 fallback paths。


1️⃣3️⃣ Execution Failure Modes


Common Execution Failures


Example

Executor retries a write operation without idempotency
→ Duplicate ticket created

Controls


👉 面试回答

Execution 可能失败, 因为 tools 和 external systems 并不总是可靠。

Executor 应该处理 timeouts、retries、 permission failures、invalid arguments、 idempotency 和 structured error handling。


1️⃣4️⃣ Observability


What to Log

For planning:

For execution:


为什么重要?

你需要回答:

Was the plan wrong,
or did execution fail?

👉 面试回答

Observability 应该把 planning traces 和 execution traces 分开。

这样才能判断 failure 来自 bad reasoning、 bad tool selection、permission issues, 还是 external system failure。


1️⃣5️⃣ Best Practices


Practical Rules


Design Principle

Planner decides what to do.
Executor controls what is allowed to happen.

👉 面试回答

最好的 agent systems 会分离 planning 和 execution。

Planner 负责根据 goal 推理并提出 steps。

Executor 负责安全执行 approved actions, 包括 permission checks、validation、 retries、timeouts 和 audit logs。


🧠 Staff-Level Answer Final


👉 面试回答完整版本

Planning 和 execution 是 AI agent systems 中两个不同的职责。

Planning 是 reasoning phase。 Agent 理解 user goal, 把目标拆解成 steps, 选择 strategy, 识别 required tools, 并定义 success criteria。

Execution 是 action phase。 系统执行 approved steps, 调用 tools, 检查 permissions, 处理 retries, 记录 observations, 并阻止 unsafe actions。

我倾向于把这两个 concerns 分离, 因为 reasoning 和 action 的风险不同。

Planner 可以提出应该发生什么, 但 executor 必须控制什么真正被允许发生。

常见 production pattern 是:

user goal → planner creates structured plan → policy engine validates the plan → executor runs approved steps → tool results are observed → planner may revise the plan → final answer is generated。

这种 plan-execute-observe loop 让 agents 更 adaptive。

但它也增加 latency、cost 和 complexity。

所以我们需要 constraints: max step limits、retry limits、 risk classification、permission checks、 human approval for high-risk actions、 idempotency for writes 和 detailed observability。

Planning 可能因为 wrong goal、missing steps、 bad tool choices、unsafe plans 或 over-planning 而失败。

Execution 可能因为 tool timeouts、 invalid arguments、permission errors、 stale data、duplicate actions 或 idempotency failures 而失败。

所以 production agent systems 应该分别记录 planning 和 execution。

我们需要知道 failure 是来自 bad reasoning, 还是 bad execution。

核心原则是: planner decides what to do, but executor controls what is allowed to happen。


⭐ Final Insight

AI Agent 里的 Planning 和 Execution 不能混在一起看。

Planning 是:

“应该做什么?”

Execution 是:

“能不能安全地做?怎么做?做完结果是什么?”

真正 production-ready 的 agent, 不是让 LLM 自由行动。

而是:

Planner proposes.

Executor validates and controls.

Tools execute safely.

Observations feed back into the next plan.

最重要的一句话:

Planner decides intent.

Executor enforces reality.


📌 Staff Memorization Pack


30-Second Answer

Planning decides what should happen; execution performs bounded actions. Separating them improves reliability because plans can be inspected, constrained, validated, and retried step by step.

In production, I would design it with explicit boundaries around planning, execution, validation, permissions, state, observability, and fallback behavior.


2-Minute Staff Answer

For Planning vs Execution in AI Agents, I would start by separating the model’s reasoning role from the system’s execution guarantees.

The LLM can interpret ambiguous intent, produce plans, choose tools, summarize context, and adapt to observations. But the surrounding platform must enforce deterministic controls: schemas, permissions, timeouts, retries, idempotency, audit logging, and policy checks.

My design would include a clear orchestration layer, bounded tool access, managed state, validation after important steps, and human approval for high-risk actions. I would also add tracing for every model call, tool call, decision point, and failure so the system can be debugged and improved.

The staff-level trade-off is autonomy versus control. More autonomy improves flexibility, but it increases cost, latency, unpredictability, and safety risk. A production design should give the agent enough freedom to solve ambiguous tasks while keeping irreversible or correctness-critical actions inside deterministic backend systems.


Architecture Points to Memorize

  1. Goal interpreter normalizes user intent
  2. Planner creates steps with dependencies and success criteria
  3. Policy layer rejects risky or unsupported steps
  4. Executor performs one step at a time through tools
  5. Observer captures results and errors
  6. Replanner updates the plan after new evidence
  7. Validator checks final output and intermediate state
  8. Trace store records plan versions and execution history

Failure Modes to Call Out


Guardrails and Controls

A strong production answer should mention:


Common Follow-up Questions

How do you make it reliable?

I would constrain the action space, validate every tool call, make side effects idempotent, add step limits, log full traces, and convert production failures into eval cases. Reliability comes from the system around the model, not from trusting the model blindly.

How do you control cost and latency?

I would use smaller models for simple steps, cache stable context, limit retrieval size, set max iterations, parallelize safe independent work, and stop early when confidence is high enough. I would track cost per task, tokens per step, tool latency, and timeout rate.

How do you handle unsafe actions?

I would classify actions by risk. Read-only actions can be more automated, but writes, money movement, permission changes, deletion, external communication, and compliance-sensitive actions should require deterministic validation or human approval.

How do you debug failures?

I would inspect the agent trace: user goal, prompt version, retrieved context, plan, tool calls, observations, validation results, and final output. Without step-level traces, agent failures are almost impossible to debug at production quality.


中文背诵版

Planning vs Execution in AI Agents 的 Staff 级回答,核心不是说模型有多聪明,而是说怎么把 agent 做成可控的生产系统。

LLM 负责理解目标、拆解任务、选择工具、总结上下文和根据观察调整计划。 但是 deterministic backend 必须负责权限、schema 校验、业务规则、幂等、事务、审计和合规。

我会把系统拆成 orchestrator、planner、tool router、execution layer、memory/state store、validator、guardrails、observability 和 fallback path。 每一步都要有 trace,每个 tool call 都要有权限和参数校验,高风险动作要有人审或 deterministic validation。

Staff 级 trade-off 是 autonomy versus control。 Autonomy 越高,系统越灵活,但 latency、cost、debug 难度和 safety risk 也越高。 所以生产设计要限制 agent 的 action space,把不可逆和 correctness-critical 的动作留给传统后端执行。


Staff-Level Final Sentence

At staff level, I would make plans explicit artifacts. A good agent system should know what step it is on, what success means, what tools are allowed, when to stop, and how to recover from partial failure.


Implement